BDC - Most simple way to connect to Azure Data Lake Storage Gen2

Post date: Jun 7, 2020 6:17:22 PM

Starting to use SQL Server Big Data Clusters trying to figure out how to access files from Azure Data Lake storage?

Most simple way which I've found is to use the key. OAuth2 is better in the long run but might be harder to configure.

This will do for demo and training

spark.sparkContext.hadoopConfiguration.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") spark.sparkContext.hadoopConfiguration.set("fs.azure.account.key.<storage account name>.blob.core.windows.net", "<azure data lake key>")

Now we can access using wasbs

val baseDir = "wasbs://<container name>@<storage account name>.blob.core.windows.net/"  val dfParquet = spark.read.parquet(baseDir + "some_file.snappy.parquet")  dfParquet.show(10)