使用 spark sql 在 sqlserver 上执行查询

execute query on sqlserver using spark sql(使用 spark sql 在 sqlserver 上执行查询)
本文介绍了使用 spark sql 在 sqlserver 上执行查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我正在尝试使用 spark sql 获取 sql server 架构中所有表的行数和列数.

I am trying to get the row count and column count of all the tables in a schema in sql server using spark sql.

当我使用 sqoop 执行以下查询时,它给了我正确的结果.

when I execute below query using sqoop, it's giving me the correct results.

sqoop eval --connect "jdbc:sqlserver://<hostname>;database=<dbname>" \
--username=<username> --password=<pwd> \
--query """SELECT 
ta.name TableName ,
pa.rows RowCnt, 
COUNT(ins.COLUMN_NAME) ColCnt FROM <db>.sys.tables ta INNER JOIN 
<db>.sys.partitions pa ON pa.OBJECT_ID = ta.OBJECT_ID INNER JOIN 
<db>.sys.schemas sc ON ta.schema_id = sc.schema_id join 
<db>.INFORMATION_SCHEMA.COLUMNS ins on ins.TABLE_SCHEMA =sc.name and ins.TABLE_NAME=ta.name 
WHERE ta.is_ms_shipped = 0 AND pa.index_id IN (1,0) and sc.name ='<schema>' GROUP BY sc.name, ta.name, pa.rows order by 
TableName"""

但是当我尝试从 spark sql 执行相同的查询时,我收到错误消息com.microsoft.sqlserver.jdbc.SQLServerException:关键字‘WHERE’附近的语法不正确"如果有人对此错误有任何想法,请帮助我.

But when I try to execute the same query from spark sql, I am getting an error that "com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword 'WHERE'" Please help me out, if anyone has an idea about this error.

下面是我执行的spark sql命令spark-shell --jars "/var/lib/sqoop/sqljdbc42.jar"

Below is the spark sql command I executed spark-shell --jars "/var/lib/sqoop/sqljdbc42.jar"

sqlContext.read.format("jdbc").option("url", "jdbc:sqlserver://<hostname>;database=<dbname>;user=<user>;password=<pwd>").option("dbtable", """(SELECT 
ta.name TableName ,pa.rows RowCnt, 
COUNT(ins.COLUMN_NAME) ColCnt FROM <db>.sys.tables ta INNER JOIN 
<db>.sys.partitions pa ON pa.OBJECT_ID = ta.OBJECT_ID INNER JOIN 
<db>.sys.schemas sc ON ta.schema_id = sc.schema_id join 
<db>.INFORMATION_SCHEMA.COLUMNS ins on ins.TABLE_SCHEMA =sc.name and ins.TABLE_NAME=ta.name 
WHERE ta.is_ms_shipped = 0 AND pa.index_id IN (1,0) and sc.name ='<schema>' GROUP BY sc.name,ta.name, pa.rows)""").option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver").load()

预期输出:

表名、RowCnt、ColCnt

TableName, RowCnt, ColCnt

表 A、62、30

表 B, 3846, 76

table B, 3846, 76

推荐答案

Spark SQL 命令中的问题在于 dbTable 选项.

The problem in your Spark SQL command is with the dbTable option.

dbTable 接受可以使用的 SQL 查询的 FROM 子句中有效的任何内容.例如,您还可以使用括号中的子查询来代替完整的表.但是,在括号中使用子查询时,它应该有一个别名.因此你的命令应该修改为,

dbTable accepts anything that is valid in a FROM clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses. However, when using subqueries in parentheses, it should have an alias. Thus your command should be modified as,

sqlContext
.read
.format("jdbc")
.option("url", "jdbc:sqlserver://<hostname>;database=<dbname>;user=<user>;password=<pwd>")
.option("dbtable", 
    """(SELECT 
    ta.name TableName ,
    pa.rows RowCnt, 
    COUNT(ins.COLUMN_NAME) ColCnt 
    FROM <db>.sys.tables ta 
    INNER JOIN 
    <db>.sys.partitions pa 
    ON pa.OBJECT_ID = ta.OBJECT_ID 
    INNER JOIN 
    <db>.sys.schemas sc 
    ON ta.schema_id = sc.schema_id 
    JOIN 
    <db>.INFORMATION_SCHEMA.COLUMNS ins 
    ON ins.TABLE_SCHEMA = sc.name and ins.TABLE_NAME = ta.name 
    WHERE ta.is_ms_shipped = 0 
     AND pa.index_id IN (1,0) 
     AND sc.name ='<schema>' 
    GROUP BY sc.name,ta.name, pa.rows) as TEMP""")
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
.load()

只是一种预感.希望这会有所帮助!

Just a hunch. Hope this helps!

这篇关于使用 spark sql 在 sqlserver 上执行查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

Does SQL Server Offer Anything Like MySQL#39;s ON DUPLICATE KEY UPDATE(SQL Server 是否提供类似于 MySQL 的 ON DUPLICATE KEY UPDATE 之类的东西)
Storing JSON in database vs. having a new column for each key(将 JSON 存储在数据库中与为每个键创建一个新列)
How to export SQL Server database to MySQL?(如何将 SQL Server 数据库导出到 MySQL?)
Emulate MySQL LIMIT clause in Microsoft SQL Server 2000(在 Microsoft SQL Server 2000 中模拟 MySQL LIMIT 子句)
SQL: Repeat a result row multiple times, and number the rows(SQL:多次重复结果行,并对行进行编号)
MySQL LIMIT clause equivalent for SQL SERVER(与 SQL SERVER 等效的 MySQL LIMIT 子句)