Spark - Microsoft SQL Server MSSQL using Windows Authentification to Parquet

Post date: Sep 17, 2019 8:32:04 AM

This snippet is using Spark running locally on my windows machine.

Start spark shell, make sure to include jdbc driver and sqljdbc_auth.dll. Both these files are from the microsoft jdbc driver download.

From CMD:

>spark-shell --jars mssql-jdbc-7.4.1.jre8.jar,sqljdbc_auth.dll --driver-class-path mssql-jdbc-7.4.1.jre8.jar

First set the timezone to UTC, or your local timezone. This will effect Date values in the parquet files. Be sure to set this correctly!

java.util.TimeZone.setDefault(java.util.TimeZone.getTimeZone("UTC"))

Create array of tables to export

var tables = Array("table_A","table_B","table_C")

create parquet file for each table

tables.foreach((table: String) => spark.read.format("jdbc") .option("url",

"jdbc:sqlserver://servername;databaseName=TESTDB;integratedSecurity=true") .option("dbtable", s"DIM.$table") .load() .write .parquet(s"c:\\src\\export\\$table.parquet"))