Post date: Jun 7, 2020 6:17:22 PM
Starting to use SQL Server Big Data Clusters trying to figure out how to access files from Azure Data Lake storage?
Most simple way which I've found is to use the key. OAuth2 is better in the long run but might be harder to configure.
This will do for demo and training
spark.sparkContext.hadoopConfiguration.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") spark.sparkContext.hadoopConfiguration.set("fs.azure.account.key.<storage account name>.blob.core.windows.net", "<azure data lake key>")
Now we can access using wasbs
val baseDir = "wasbs://<container name>@<storage account name>.blob.core.windows.net/" val dfParquet = spark.read.parquet(baseDir + "some_file.snappy.parquet") dfParquet.show(10)