Just Code‎ > ‎

Spark - Microsoft SQL Server MSSQL using Windows Authentification to Parquet

posted Sep 17, 2019, 1:32 AM by Peter Henell   [ updated Oct 2, 2019, 5:59 AM ]
This snippet is using Spark running locally on my windows machine.

Start spark shell, make sure to include jdbc driver and sqljdbc_auth.dll. Both these files are from the microsoft jdbc driver download.

From CMD:
>spark-shell --jars mssql-jdbc-7.4.1.jre8.jar,sqljdbc_auth.dll --driver-class-path mssql-jdbc-7.4.1.jre8.jar

First set the timezone to UTC, or your local timezone. This will effect Date values in the parquet files. Be sure to set this correctly!
java.util.TimeZone.setDefault(java.util.TimeZone.getTimeZone("UTC"))
Create array of tables to export

var tables = Array("table_A","table_B","table_C")

create parquet file for each table

tables.foreach((table: String) => 
    spark.read.format("jdbc")
        .option("url", 
        "jdbc:sqlserver://servername;databaseName=TESTDB;integratedSecurity=true") .option("dbtable", s"DIM.$table") .load() .write .parquet(s"c:\\src\\export\\$table.parquet"))



Comments