<i id='pydWw'><tr id='pydWw'><dt id='pydWw'><q id='pydWw'><span id='pydWw'><b id='pydWw'><form id='pydWw'><ins id='pydWw'></ins><ul id='pydWw'></ul><sub id='pydWw'></sub></form><legend id='pydWw'></legend><bdo id='pydWw'><pre id='pydWw'><center id='pydWw'></center></pre></bdo></b><th id='pydWw'></th></span></q></dt></tr></i><div id='pydWw'><tfoot id='pydWw'></tfoot><dl id='pydWw'><fieldset id='pydWw'></fieldset></dl></div>

  1. <small id='pydWw'></small><noframes id='pydWw'>

  2. <legend id='pydWw'><style id='pydWw'><dir id='pydWw'><q id='pydWw'></q></dir></style></legend>
    • <bdo id='pydWw'></bdo><ul id='pydWw'></ul>
  3. <tfoot id='pydWw'></tfoot>
    1. 使用 sql/plsql,你如何找出文本使用的字符集?

      Using sql/plsql, how do you find out which character set a text uses?(使用 sql/plsql,你如何找出文本使用的字符集?)

      <tfoot id='oaOy7'></tfoot>

      <small id='oaOy7'></small><noframes id='oaOy7'>

    2. <legend id='oaOy7'><style id='oaOy7'><dir id='oaOy7'><q id='oaOy7'></q></dir></style></legend>
    3. <i id='oaOy7'><tr id='oaOy7'><dt id='oaOy7'><q id='oaOy7'><span id='oaOy7'><b id='oaOy7'><form id='oaOy7'><ins id='oaOy7'></ins><ul id='oaOy7'></ul><sub id='oaOy7'></sub></form><legend id='oaOy7'></legend><bdo id='oaOy7'><pre id='oaOy7'><center id='oaOy7'></center></pre></bdo></b><th id='oaOy7'></th></span></q></dt></tr></i><div id='oaOy7'><tfoot id='oaOy7'></tfoot><dl id='oaOy7'><fieldset id='oaOy7'></fieldset></dl></div>
        <tbody id='oaOy7'></tbody>

            <bdo id='oaOy7'></bdo><ul id='oaOy7'></ul>

                本文介绍了使用 sql/plsql,你如何找出文本使用的字符集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                我有一个 Oracle 数据库,用于存储来自世界各地的不同语言的文档内容.文档存储在带有 BLOB 列的表中,该列存储文档的内容.

                I have an Oracle db which stores the content of documents originating from all over the world, with different languages. The documents are stored in a table with a BLOB column which stores the documents' content.

                我想通过 Oracle 过程找出每个文档的字符集.我不想使用实用程序 CSSCAN,因为您似乎必须在程序之外的单独会话中使用它.

                I want to find out what the char set is for every doc, with an Oracle procedure. I don't want to use the utility CSSCAN since it seems you have to use it in a separate session, outside of your procedure.

                感谢您的帮助!

                推荐答案

                Oracle Globalization Development Kit 可以检测字符集.

                GDK 包含在 Oracle 中,但默认情况下未安装在数据库中.要将 .jar 文件加载到数据库中,请在 Oracle 中找到 jlib 目录home 并运行此操作系统命令:

                The GDK is included with Oracle but it is not installed in the database by default. To load the .jar files into the database find the jlib directory in the Oracle home and run this operating system command:

                loadjava -u USER_NAME@SID orai18n.jar orai18n-collation.jar orai18n-lcsd.jar orai18n-mapping.jar orai18n-net.jar orai18n-servlet.jar orai18n-tools.jar orai18n-translation.jar orai18n-utility.jar
                

                需要一些额外的 Java 权限,即使您的用户具有 DBA.运行此命令,然后重新连接:

                Some extra Java privileges are needed, even if your user has DBA. Run this command and then re-connect:

                exec dbms_java.grant_permission( 'YOUR_USER_NAME', 'SYS:java.lang.RuntimePermission', 'getClassLoader', '' );
                

                创建一个 Java 类来进行检测.下面是一个非常简单的例子,它返回一个字符串的最佳猜测:

                Create a Java class to do the detection. Below is a very simple example that returns the best guess for a string:

                create or replace and compile java source named "Character_Set_Detector"
                as
                import oracle.i18n.lcsd.*;
                import java.sql.*;
                import java.io.IOException;
                public class Character_Set_Detector
                {
                    public static String detect(Blob some_blob) throws SQLException, IOException
                    {
                        LCSDetector detector = new LCSDetector();
                        detector.detect(some_blob.getBinaryStream());
                        LCSDResultSet detector_results = detector.getResult();
                        return detector_results.getORACharacterSet();
                    }
                }
                /
                

                将 Java 类包装在 PL/SQL 函数中:

                Wrap the Java class in a PL/SQL function:

                --Wrap the Java class in a PL/SQL function:
                create or replace function detect_character_set(some_blob blob)
                return varchar2
                as language java
                name 'Character_Set_Detector.detect(java.sql.Blob) return java.lang.String';
                /
                

                我通过将字符串翻译成不同的语言、使用文本编辑器将文本保存为不同的编码、使用十六进制编辑器打开文件并将十六进制转换为 BLOB 来模拟不同的字符集:

                I simulated different character sets by translating a string into different languages, saving the text as different encodings with a text editor, opening the file with hex editor, and converting the hex into a BLOB:

                --UTF8
                --The quick brown fox jumps over the lazy dog
                select 1 id, detect_character_set(hextoraw('54686520717569636b2062726f776e20666f78206a756d7073206f76657220746865206c617a7920646f67')) character_set from dual union all
                --Western European (ISO-8859-1)
                --El zorro marrón rápido salta sobre el perro perezoso
                select 2 id, detect_character_set(hextoraw('456c207a6f72726f206d617272f36e2072e17069646f2073616c746120736f62726520656c20706572726f20706572657a6f736f')) from dual union all
                --Chinese Simplified (GBK)
                --敏捷的棕色狐狸跳过懒狗
                select 3 id, detect_character_set(hextoraw('c3f4bdddb5c4d7d8c9abbafcc0eaccf8b9fdc0c1b9b7')) from dual union all
                --Western European (Windows-1252)
                --Der schnelle braune Fuchs springt über den faulen Hund
                select 4 id, detect_character_set(hextoraw('446572207363686e656c6c6520627261756e6520467563687320737072696e677420fc6265722064656e206661756c656e2048756e64')) from dual union all
                --Cyrillic (KOI8-R)
                --Быстрая коричневая лиса прыгает через ленивую собаку
                select 5 id, detect_character_set(hextoraw('e2d9d3d4d2c1d120cbcfd2c9decec5d7c1d120ccc9d3c120d0d2d9c7c1c5d420dec5d2c5da20ccc5cec9d7d5c020d3cfc2c1cbd5')) from dual;
                
                ID  CHARACTER_SET
                --  -------------
                1   US7ASCII
                2   WE8ISO8859P1
                3   ZHS16CGB231280
                4   WE8ISO8859P1
                5   CL8KOI8R
                

                这个简单的例子效果很好,但我不知道它对现实世界的文件效果如何.GDK中有很多功能,上面的代码只是一个简单的起点.只需稍作改动,代码也可以检测语言,如我的回答此处所示.

                That trivial example works well but I don't know how well it will work with real-world files. There are a lot of features in the GDK, the above code is only a simple starting point. With only minor changes the code can also detect languages as demonstrated in my answer here.

                这篇关于使用 sql/plsql,你如何找出文本使用的字符集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                相关文档推荐

                How do I split flat file data and load into parent-child tables in database?(如何拆分平面文件数据并加载到数据库中的父子表中?)
                How to import CSV into sqlite using RSqlite?(如何使用 RSqlite 将 CSV 导入 sqlite?)
                Import CSV to Update rows in table(导入 CSV 以更新表中的行)
                Importing MaxMind#39;s GeoLite2 to MySQL(将 MaxMind 的 GeoLite2 导入 MySQL)
                Import / Export database with SQL Server Server Management Studio(使用 SQL Server Server Management Studio 导入/导出数据库)
                How do you import a large MS SQL .sql file?(如何导入大型 MS SQL .sql 文件?)
                <tfoot id='0mB4m'></tfoot>

                    <i id='0mB4m'><tr id='0mB4m'><dt id='0mB4m'><q id='0mB4m'><span id='0mB4m'><b id='0mB4m'><form id='0mB4m'><ins id='0mB4m'></ins><ul id='0mB4m'></ul><sub id='0mB4m'></sub></form><legend id='0mB4m'></legend><bdo id='0mB4m'><pre id='0mB4m'><center id='0mB4m'></center></pre></bdo></b><th id='0mB4m'></th></span></q></dt></tr></i><div id='0mB4m'><tfoot id='0mB4m'></tfoot><dl id='0mB4m'><fieldset id='0mB4m'></fieldset></dl></div>

                      <bdo id='0mB4m'></bdo><ul id='0mB4m'></ul>

                      1. <legend id='0mB4m'><style id='0mB4m'><dir id='0mB4m'><q id='0mB4m'></q></dir></style></legend>

                        • <small id='0mB4m'></small><noframes id='0mB4m'>

                            <tbody id='0mB4m'></tbody>