Post date: Sep 17, 2019 8:32:04 AM
This snippet is using Spark running locally on my windows machine.
Start spark shell, make sure to include jdbc driver and sqljdbc_auth.dll. Both these files are from the microsoft jdbc driver download.
From CMD:
>spark-shell --jars mssql-jdbc-7.4.1.jre8.jar,sqljdbc_auth.dll --driver-class-path mssql-jdbc-7.4.1.jre8.jar
First set the timezone to UTC, or your local timezone. This will effect Date values in the parquet files. Be sure to set this correctly!
java.util.TimeZone.setDefault(java.util.TimeZone.getTimeZone("UTC"))
Create array of tables to export
var tables = Array("table_A","table_B","table_C")
create parquet file for each table
tables.foreach((table: String) => spark.read.format("jdbc") .option("url",
"jdbc:sqlserver://servername;databaseName=TESTDB;integratedSecurity=true") .option("dbtable", s"DIM.$table") .load() .write .parquet(s"c:\\src\\export\\$table.parquet"))