将所有 XML 特殊字符转换回常规字符(在 SQL 中)

Convert all XML Special Characters back to Regular Characters (Within SQL)(将所有 XML 特殊字符转换回常规字符(在 SQL 中))
本文介绍了将所有 XML 特殊字符转换回常规字符(在 SQL 中)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

如何将 XML 中的所有特殊字符转换为 ASCII 值?

How do I convert all Special Characters in XML to the ASCII Value?

例如

DECLARE @xml XML = (SELECT 'abc & xyz><' FOR XML PATH(''))
SELECT @xml --@xml is now 'abc &amp; xyz &gt;&lt;'

我希望转换回 ASCII varchar 值(即abc & xyz><").我发现的唯一方法是手动替换所有特殊的 XML 字符,即

I wish to convert back to the ASCII varchar value (i.e. 'abc & xyz><'). The only way I have found is to manually replace all special XML Characters i.e.

SELECT REPLACE(REPLACE(REPLACE(CAST(@xml AS VARCHAR(MAX)),'&amp;','&'),'&gt;','>'),'&lt;','<');
--RETURNS 'abc & xyz><'

然而,这是一个非常不雅的解决方案,并不能处理所有的 XML 字符转换.是否有任何内置的 SQL Server 函数可以执行此操作?

However, this is a very inelegant solution, and does not handle all XML Character conversions. Is there any built-in SQL Server function to do this?

推荐答案

更新:将我之前的解决方案保留在下方,但根据 Jeremy 发布的内容提出了更好的解决方案.

Update: leaving my previous solution available below but came up with a better one based on what Jeremy Posted.

新解决方案:

DECLARE @xml XML = 'abc &amp; xyz &gt;&lt;';

SELECT newstring = ((SELECT @xml FOR XML PATH(''), TYPE).value('.', 'varchar(8000)'));

退货:

abc & xyz ><

旧解决方案(仍然可行):

对于这种类型的事情,我有几个函数.首先你需要 rangeAB 和 CharMapAB

I have a couple functions for this type of thing. First you need rangeAB and CharMapAB

RangeAB

CREATE FUNCTION dbo.rangeAB
(
  @low  bigint, 
  @high bigint, 
  @gap  bigint,
  @row1 bit
)
/****************************************************************************************
[Purpose]:
 Creates up to 531,441,000,000 sequentia integers numbers beginning with @low and ending 
 with @high. Used to replace iterative methods such as loops, cursors and recursive CTEs 
 to solve SQL problems. Based on Itzik Ben-Gan's getnums function with some tweeks and 
 enhancements and added functionality. The logic for getting rn to begin at 0 or 1 is 
 based comes from Jeff Moden's fnTally function. 

 The name range because it's similar to clojure's range function. The name "rangeAB" as 
 used because "range" is a reserved SQL keyword.

[Author]: Alan Burstein

[Compatibility]: 
 SQL Server 2008+ and Azure SQL Database

[Syntax]:
 SELECT r.RN, r.OP, r.N1, r.N2
 FROM dbo.rangeAB(@low,@high,@gap,@row1) AS r;

[Parameters]:
 @low  = a bigint that represents the lowest value for n1.
 @high = a bigint that represents the highest value for n1.
 @gap  = a bigint that represents how much n1 and n2 will increase each row; @gap also
         represents the difference between n1 and n2.
 @row1 = a bit that represents the first value of rn. When @row = 0 then rn begins
         at 0, when @row = 1 then rn will begin at 1.

[Returns]:
 Inline Table Valued Function returns:
 rn = bigint; a row number that works just like T-SQL ROW_NUMBER() except that it can 
      start at 0 or 1 which is dictated by @row1.
 op = bigint; returns the "opposite number that relates to rn. When rn begins with 0 and
      ends with 10 then 10 is the opposite of 0, 9 the opposite of 1, etc. When rn begins
      with 1 and ends with 5 then 1 is the opposite of 5, 2 the opposite of 4, etc...
 n1 = bigint; a sequential number starting at the value of @low and incrimentingby the
      value of @gap until it is less than or equal to the value of @high.
 n2 = bigint; a sequential number starting at the value of @low+@gap and  incrimenting 
      by the value of @gap.

[Dependencies]:
N/A

[Developer Notes]:

 1. The lowest and highest possible numbers returned are whatever is allowable by a 
    bigint. The function, however, returns no more than 531,441,000,000 rows (8100^3). 
 2. @gap does not affect rn, rn will begin at @row1 and increase by 1 until the last row
    unless its used in a query where a filter is applied to rn.
 3. @gap must be greater than 0 or the function will not return any rows.
 4. Keep in mind that when @row1 is 0 then the highest row-number will be the number of
    rows returned minus 1
 5. If you only need is a sequential set beginning at 0 or 1 then, for best performance
    use the RN column. Use N1 and/or N2 when you need to begin your sequence at any 
    number other than 0 or 1 or if you need a gap between your sequence of numbers. 
 6. Although @gap is a bigint it must be a positive integer or the function will
    not return any rows.
 7. The function will not return any rows when one of the following conditions are true:
      * any of the input parameters are NULL
      * @high is less than @low 
      * @gap is not greater than 0
    To force the function to return all NULLs instead of not returning anything you can
    add the following code to the end of the query:

      UNION ALL 
      SELECT NULL, NULL, NULL, NULL
      WHERE NOT (@high&@low&@gap&@row1 IS NOT NULL AND @high >= @low AND @gap > 0)

    This code was excluded as it adds a ~5% performance penalty.
 8. There is no performance penalty for sorting by rn ASC; there is a large performance 
    penalty for sorting in descending order WHEN @row1 = 1; WHEN @row1 = 0
    If you need a descending sort the use op in place of rn then sort by rn ASC. 

Best Practices:
--===== 1. Using RN (rownumber)
 -- (1.1) The best way to get the numbers 1,2,3...@high (e.g. 1 to 5):
 SELECT RN FROM dbo.rangeAB(1,5,1,1);
 -- (1.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 0 to 5):
 SELECT RN FROM dbo.rangeAB(0,5,1,0);

--===== 2. Using OP for descending sorts without a performance penalty
 -- (2.1) The best way to get the numbers 5,4,3...@high (e.g. 5 to 1):
 SELECT op FROM dbo.rangeAB(1,5,1,1) ORDER BY rn ASC;
 -- (2.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 5 to 0):
 SELECT op FROM dbo.rangeAB(1,6,1,0) ORDER BY rn ASC;

--===== 3. Using N1
 -- (3.1) To begin with numbers other than 0 or 1 use N1 (e.g. -3 to 3):
 SELECT N1 FROM dbo.rangeAB(-3,3,1,1);
 -- (3.2) ROW_NUMBER() is built in. If you want a ROW_NUMBER() include RN:
 SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,1);
 -- (3.3) If you wanted a ROW_NUMBER() that started at 0 you would do this:
 SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,0);

--===== 4. Using N2 and @gap
 -- (4.1) To get 0,10,20,30...100, set @low to 0, @high to 100 and @gap to 10:
 SELECT N1 FROM dbo.rangeAB(0,100,10,1);
 -- (4.2) Note that N2=N1+@gap; this allows you to create a sequence of ranges.
 --       For example, to get (0,10),(10,20),(20,30).... (90,100):
 SELECT N1, N2 FROM dbo.rangeAB(0,90,10,1);
 -- (4.3) Remember that a rownumber is included and it can begin at 0 or 1:
 SELECT RN, N1, N2 FROM dbo.rangeAB(0,90,10,1);

[Examples]:
--===== 1. Generating Sample data (using rangeAB to create "dummy rows")
 -- The query below will generate 10,000 ids and random numbers between 50,000 and 500,000
 SELECT
   someId    = r.rn,
   someNumer = ABS(CHECKSUM(NEWID())%450000)+50001 
 FROM rangeAB(1,10000,1,1) r;

--===== 2. Create a series of dates; rn is 0 to include the first date in the series
 DECLARE @startdate DATE = '20180101', @enddate DATE = '20180131';

 SELECT r.rn, calDate = DATEADD(dd, r.rn, @startdate)
 FROM dbo.rangeAB(1, DATEDIFF(dd,@startdate,@enddate),1,0) r;
 GO

--===== 3. Splitting (tokenizing) a string with fixed sized items
 -- given a delimited string of identifiers that are always 7 characters long
 DECLARE @string VARCHAR(1000) = 'A601225,B435223,G008081,R678567';

 SELECT
   itemNumber = r.rn, -- item's ordinal position 
   itemIndex  = r.n1, -- item's position in the string (it's CHARINDEX value)
   item       = SUBSTRING(@string, r.n1, 7) -- item (token)
 FROM dbo.rangeAB(1, LEN(@string), 8,1)  r;
 GO

--===== 4. Splitting (tokenizing) a string with random delimiters
 DECLARE @string VARCHAR(1000) = 'ABC123,999F,XX,9994443335';

 SELECT
   itemNumber = ROW_NUMBER() OVER (ORDER BY r.rn), -- item's ordinal position 
   itemIndex  = r.n1+1, -- item's position in the string (it's CHARINDEX value)
   item       = SUBSTRING
               (
                 @string,
                 r.n1+1,
                 ISNULL(NULLIF(CHARINDEX(',',@string,r.n1+1),0)-r.n1-1, 8000)
               ) -- item (token)
 FROM dbo.rangeAB(0,DATALENGTH(@string),1,1) r
 WHERE SUBSTRING(@string,r.n1,1) = ',' OR r.n1 = 0;
 -- logic borrowed from: http://www.sqlservercentral.com/articles/Tally+Table/72993/

--===== 5. Grouping by a weekly intervals
 -- 5.1. how to create a series of start/end dates between @startDate & @endDate
 DECLARE @startDate DATE = '1/1/2015', @endDate DATE = '2/1/2015';
 SELECT 
   WeekNbr   = r.RN,
   WeekStart = DATEADD(DAY,r.N1,@StartDate), 
   WeekEnd   = DATEADD(DAY,r.N2-1,@StartDate)
 FROM dbo.rangeAB(0,datediff(DAY,@StartDate,@EndDate),7,1) r;
 GO

 -- 5.2. LEFT JOIN to the weekly interval table
 BEGIN
  DECLARE @startDate datetime = '1/1/2015', @endDate datetime = '2/1/2015';
  -- sample data 
  DECLARE @loans TABLE (loID INT, lockDate DATE);
  INSERT @loans SELECT r.rn, DATEADD(dd, ABS(CHECKSUM(NEWID())%32), @startDate)
  FROM dbo.rangeAB(1,50,1,1) r;

  -- solution 
  SELECT 
    WeekNbr   = r.RN,
    WeekStart = dt.WeekStart, 
    WeekEnd   = dt.WeekEnd,
    total     = COUNT(l.lockDate)
  FROM dbo.rangeAB(0,datediff(DAY,@StartDate,@EndDate),7,1) r
  CROSS APPLY (VALUES (
    CAST(DATEADD(DAY,r.N1,@StartDate) AS DATE), 
    CAST(DATEADD(DAY,r.N2-1,@StartDate) AS DATE))) dt(WeekStart,WeekEnd)
  LEFT JOIN @loans l ON l.lockDate BETWEEN  dt.WeekStart AND dt.WeekEnd
  GROUP BY r.RN, dt.WeekStart, dt.WeekEnd ;
 END;

--===== 6. Identify the first vowel and last vowel in a along with their positions
 DECLARE @string VARCHAR(200) = 'This string has vowels';

 SELECT TOP(1) position = r.rn, letter = SUBSTRING(@string,r.rn,1)
 FROM dbo.rangeAB(1,LEN(@string),1,1) r
 WHERE SUBSTRING(@string,r.rn,1) LIKE '%[aeiou]%'
 ORDER BY r.rn;

 -- To avoid a sort in the execution plan we'll use op instead of rn
 SELECT TOP(1) position = r.op, letter = SUBSTRING(@string,r.op,1)
 FROM dbo.rangeAB(1,LEN(@string),1,1) r
 WHERE SUBSTRING(@string,r.rn,1) LIKE '%[aeiou]%'
 ORDER BY r.rn;

---------------------------------------------------------------------------------------
[Revision History]:
 Rev 00 - 20140518 - Initial Development - Alan Burstein
 Rev 01 - 20151029 - Added 65 rows to make L1=465; 465^3=100.5M. Updated comment section
                   - Alan Burstein
 Rev 02 - 20180613 - Complete re-design including opposite number column (op)
 Rev 03 - 20180920 - Added additional CROSS JOIN to L2 for 530B rows max - Alan Burstein
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH L1(N) AS 
(
  SELECT 1
  FROM (VALUES
   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
   (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
   (0),(0)) T(N) -- 90 values 
),
L2(N)  AS (SELECT 1 FROM L1 a CROSS JOIN L1 b CROSS JOIN L1 c),
iTally AS (SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM L2 a CROSS JOIN L2 b)
SELECT  
  r.RN,
  r.OP,
  r.N1,
  r.N2
FROM
(
  SELECT
    RN = 0,
    OP = (@high-@low)/@gap,
    N1 = @low,
    N2 = @gap+@low
  WHERE @row1 = 0
  UNION ALL -- COALESCE required in the TOP statement below for error handling purposes
  SELECT TOP (ABS((COALESCE(@high,0)-COALESCE(@low,0))/COALESCE(@gap,0)+COALESCE(@row1,1)))
    RN = i.rn,
    OP = (@high-@low)/@gap+(2*@row1)-i.rn,
    N1 = (i.rn-@row1)*@gap+@low,
    N2 = (i.rn-(@row1-1))*@gap+@low
  FROM iTally AS i
  ORDER BY rn
) AS r
WHERE @high&@low&@gap&@row1 IS NOT NULL AND @high >= @low AND @gap > 0;

CharMapAB

CREATE FUNCTION dbo.charmapAB
(
  @asciiOnly BIT,
  @xmlCheck  BIT
) 
/*****************************************************************************************
[Purpose]:
 Generates a table containing the numbers 1 through 65535 along with the
 corrsponding CHAR(N) value (e.g. CHAR(65) = "A") and/or UNICODE value (e.g. 
 NCHAR(324) = "ń", aka the Latin minuscule: ń. 

 The ascii_xml_special and unicode_xml_special columns at bits that indicate if 
 the character is an ASCII or UNICODE Reserved XML character. The ascii_xml and 
 unicode_xml columns show what will be displayed when the character is output as
 in XML format (e.g. SELECT CAST('>' AS XML) will return "&gt;". 

 is_ascii_whitespace indicates if the character is a "whitespace character" (such
 as CHAR(9), CHAR(32) and CHAR(160)). abin is the character's ascii binary value 
 and ubin is the characters unicode binary value. 

[Developer Notes]:
 1. Have not determined UNICODE whitespace characters. 

[Examples]:
--===== Get a list of ASCII whitespace characters
  SELECT cm.* -- WhiteSpaceCharacters = 'CHAR('+CAST(n AS varchar(3))+')'
  FROM   dbo.CharmapAB(0,0) AS cm;

  SELECT cm.* -- WhiteSpaceCharacters = 'CHAR('+CAST(n AS varchar(3))+')'
  FROM   dbo.CharmapAB(1,1) AS cm;

  SELECT cm.* -- WhiteSpaceCharacters = 'CHAR('+CAST(n AS varchar(3))+')'
  FROM  dbo.CharmapAB(0,1) AS cm
  WHERE cm.char_nbr IN (9,10,13,32,38,60,62);
-----------------------------------------------------------------------------------------
[Revision History]:
 Rev 00 - May 2015 - Initial Development - Alan Burstein
 Rev 01 - 20150819 changed whitespace val, column names, added quoted_val
        - Alan Burstein
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH rowz(N) AS (SELECT CASE @asciiOnly WHEN 0 THEN 255 ELSE 65535 END)
SELECT
char_nbr        = i.RN, 
ascii_val       = CHAR(cs.RN),
unicode_val     = u.unicode_val,
quoted_val      = uq.quoted_val,
is_unicode_only = SIGN(i.RN&256),
is_acsii_ws     = CASE WHEN cs.RN IN ((2),(9),(10),(13),(32),(160)) THEN 1 ELSE 0 END,
is_ascii_blank  = CASE WHEN cs.RN BETWEEN 28  AND 31 
                         OR cs.RN BETWEEN 129 AND 159 THEN 1 ELSE 0 END,
unicode_xml_val = x.unicode_xml_val,
bin             = CAST(NCHAR(cs.RN) AS varbinary)
FROM rowz
CROSS APPLY dbo.rangeAB(1,rowz.N,1,1)       AS i
CROSS APPLY (VALUES(CHECKSUM(i.RN)))        AS cs(RN)
CROSS APPLY (SELECT TOP (@xmlCheck*1) NCHAR(cs.RN) 
             WHERE @xmlCheck = 1 
             FOR XML PATH(''))              AS x(unicode_xml_val)
CROSS APPLY (VALUES(NCHAR(cs.RN)))          AS u(unicode_val)  
CROSS APPLY (VALUES('"'+u.unicode_val+'"')) AS uq(quoted_val);

CharmapAB 将帮助您识别哪些字符是 XML:

CharmapAB will help you identify which characters are XML:

如果您运行此查询,您可以识别哪些 ASCII 字符是XML 保护的"

If you run this query you can identify which ASCII characters are "XML Protected"

SELECT cm.*
FROM  dbo.CharmapAB(0,1) AS cm;

返回(为简洁起见被截断)

char_nbr  ascii_val unicode_val quoted_val is_unicode_only      is_acsii_ws is_ascii_blank unicode_xml_val      bin
--------- --------- ----------- ---------- -------------------- ----------- -------------- -------------------- ------
1                             ""        0                    0           0              &#x01;               0x0100
2                             ""        0                    1           0              &#x02;               0x0200
....
32                              " "        0                    1           0              &#x20;               0x2000
33        !         !           "!"        0                    0           0              !                    0x2100
34        "         "           """        0                    0           0              "                    0x2200
35        #         #           "#"        0                    0           0              #                    0x2300
36        $         $           "$"        0                    0           0              $                    0x2400
37        %         %           "%"        0                    0           0              %                    0x2500
38        &         &           "&"        0                    0           0              &amp;                0x2600
39        '         '           "'"        0                    0           0              '                    0x2700
...

我的经验是前 31 个字符从不使用,除了 char(9)、char(10) 和 char(13)(制表符回车和换行).以及 char(32)、char(38)、char(60) 和 char(62),它们是:空格、与号 (&),然后大于和小于(<"和>").此查询可能足以为您提供所需的字符:

My experience has been that the first 31 characters are never used except char(9),char(10) and char(13) (tab carriage return and line returns). As well as char(32),char(38),char(60) and char(62) which are: space, ampersand (&), then greater than and less than ("<" and ">"). This query will likely be enough to get you the characters you need:

DECLARE @yourstring VARCHAR(8000) = 'ABC&amp;123&lt;xxx&gt;'

SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@yourstring,
  '&#x09;', CHAR(9)),
  '&#x0A;', CHAR(10)),
  '&#x0D;', CHAR(13)),
  '&#x20;', CHAR(32)),
  '&amp;', CHAR(38)),
  '&lt;', CHAR(60)),
  '&gt;', CHAR(62));

退货:ABC&123

您可以根据需要使用 CharMapAB 进行更新.

You can use CharMapAB to update this as needed.

这篇关于将所有 XML 特殊字符转换回常规字符(在 SQL 中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

Query with t(n) and multiple cross joins(使用 t(n) 和多个交叉连接进行查询)
Unpacking a binary string with TSQL(使用 TSQL 解包二进制字符串)
Max rows in SQL table where PK is INT 32 when seed starts at max negative value?(当种子以最大负值开始时,SQL 表中的最大行数其中 PK 为 INT 32?)
Inner Join and Group By in SQL with out an aggregate function.(SQL 中的内部连接和分组依据,没有聚合函数.)
Add a default constraint to an existing field with values(向具有值的现有字段添加默认约束)
SQL remove from running total(SQL 从运行总数中删除)