Just Code

Using this syntax highlighter, always using the Navy-style.  

Spark - Microsoft SQL Server MSSQL using Windows Authentification to Parquet

posted Sep 17, 2019, 1:32 AM by Peter Henell   [ updated Oct 2, 2019, 5:59 AM ]

This snippet is using Spark running locally on my windows machine.

Start spark shell, make sure to include jdbc driver and sqljdbc_auth.dll. Both these files are from the microsoft jdbc driver download.

From CMD:
>spark-shell --jars mssql-jdbc-7.4.1.jre8.jar,sqljdbc_auth.dll --driver-class-path mssql-jdbc-7.4.1.jre8.jar

First set the timezone to UTC, or your local timezone. This will effect Date values in the parquet files. Be sure to set this correctly!
java.util.TimeZone.setDefault(java.util.TimeZone.getTimeZone("UTC"))
Create array of tables to export

var tables = Array("table_A","table_B","table_C")

create parquet file for each table

tables.foreach((table: String) => 
    spark.read.format("jdbc")
        .option("url", 
        "jdbc:sqlserver://servername;databaseName=TESTDB;integratedSecurity=true") .option("dbtable", s"DIM.$table") .load() .write .parquet(s"c:\\src\\export\\$table.parquet"))



Nant - Property as a dictionary

posted Feb 5, 2016, 12:15 AM by Peter Henell   [ updated Feb 5, 2016, 12:15 AM ]

There is no built in Dictionary-type property in Nant. There are some third party libraries that provide such functionality but you can also do it with standard Nant.

The idea is that you want to access the value of some property based on the value of another property.

In C#:
var servers = new Dictionary<string,string>();
servers["DEV"] = "localhost:8080";
servers["CI"] = "jenkins.peterhenell.se:8080";
servers["PROD"] = "www.peterhenell.se:80";

// We can now dynamically choose the url to a server based on a string
var currentEnvironment = ReadFromConfiguration(ConfigKeys.CurrentEnvironment);
var targetServer = servers[currentEnvironment];
In Nant this would look like this:
<property name="servers.dev" value="localhost:8080" />
  <property name="servers.ci" value="jenkins.peterhenell.se:8080" />
  <property name="servers.prod" value="www.peterhenell.se:80" />

  <target name="default">
    <property name="current.environment" value="dev" />
    <property name="target.server" value="${property::get-value('servers.' + current.environment)}" />
    <echo message="${target.server}" />
  </target>

Batch - Automatically set JAVA_HOME to current installed version of java

posted Feb 4, 2016, 7:32 AM by Peter Henell   [ updated Feb 4, 2016, 7:32 AM ]

This script will set the user environment variable JAVA_HOME to the path of the current installed Oracle Java.

The current java version is read from the registry at:
HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Runtime Environment\CurrentVersion
@echo OFF

setlocal ENABLEEXTENSIONS
set KEY_NAME="HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Runtime Environment"
set VALUE_NAME=CurrentVersion

FOR /F "usebackq tokens=3*" %%A IN (`REG QUERY %KEY_NAME% /v %VALUE_NAME%`) DO (
    set currentVersion=%%A %%B
    )
FOR /F "usebackq tokens=3*" %%A IN (`REG QUERY %KEY_NAME%\%currentVersion% /v JavaHome`) DO (
    set regJavaHome=%%A %%B
    )
ECHO %regJavaHome%

ECHO Setting JAVA_HOME to %regJavaHome%. You will need to restart CMD for this to take effect.
ECHO You can Exit the script now if you do not wish to set this variable.
PAUSE 
SETX JAVA_HOME "%regJavaHome%"
ECHO done.
pause

TSQL - Improved Parallel DML between 2008 and 2014 breaking code

posted Jan 11, 2016, 11:40 PM by Peter Henell   [ updated Jan 12, 2016, 1:01 AM ]

Parallel DML have been improved between the versions of SQL Server 2008 and SQL Server 2014 - which have broken some code. 
This particular code is borderline broken even in SQL Server 2008. I would not trust it. 

The required number of rows in #t to break this depends on your server. It have to be big enough to cause a parallel plan.

This code would run in SQL Server 2008 and produce unique IDs for all rows in the table.
In SQL Server 2014 the update statement is run in parallel and causes duplicates IDs to appear.

You can cause a parallel plan in SQL Server 2008 by adding the filter (where bloat <> '') on the update statement. However, the part that calculates the value of ID is done in serial and actually produces the correct result, for me, this time.

Don't do this at home kids.
IF OBJECT_ID('tempdb..#t') IS NOT NULL DROP TABLE #t;
CREATE TABLE #t (id INT, bloat VARCHAR(500));

INSERT #t ( bloat )
SELECT TOP 2000000 'bloat n stuff'
FROM master..spt_values, master..spt_values b;

DECLARE @id INT = 0;

UPDATE t 
SET @id = id = @id + 1
FROM #t t;
--WHERE bloat <> ''

SELECT COUNT(*), COUNT(distinct id) FROM #t;
Plan in SQL Server 2014.
It is using a parallel plan and computing the ID (Compute Scalar) as part of the parallel part of the plan.
Plan in SQL Server 2008.
It is serial all the way.
We can trigger a parallel plan in SQL Server 2008 using a WHERE-clause.
The Compute Scalar operator is performed in the serial part of the plan, making the result unique.


TSQL - Query processor could not produce a query plan because of the hints defined in this query

posted Nov 24, 2015, 5:47 AM by Peter Henell   [ updated Oct 12, 2016, 12:13 AM ]

I received this error while writing TSQLT unit tests in SQL Server 2014 (I have not tested on any other version yet).

Msg 8622, Level 16, State 1, Line 24
Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

It turns out to be related to returning a fixed set from a function while using LEFT HASH JOIN. The reason can be seen in the query plan. The GetCustomers function have completely removed and replaced by a constant scan. What is more is that the filter predicate have been pushed down to the scan of the Orders table. This leaves the Hash Join operator left with nothing to compute its hashes on!

What is interesting is that this only occurs when we select rows from a row value constructor. It works fine if we use a table instead. It also work fine if we add another row to the VALUES clause.

This is the code to reproduce the error and to show that it is does work using tables instead.

USE master;
GO
DROP DATABASE TestDatabase
GO
CREATE DATABASE TestDatabase;
GO
USE TestDatabase
GO

CREATE TABLE Orders(OrderID int, CustomerID int);
INSERT dbo.Orders
        ( OrderID, CustomerID )
VALUES  ( 1, 1 ),
        ( 2, 2 );

GO
CREATE FUNCTION GetCustomers()
RETURNS TABLE AS RETURN
    SELECT CustomerID 
    FROM (VALUES (1)) cu(CustomerID)
GO

-- This query now results in an ERROR
SELECT * 
FROM GetCustomers() c
LEFT HASH JOIN dbo.Orders o ON c.CustomerID = o.CustomerID
GO

-- If we try the same again, but this time we get our rows inside the function from a table
CREATE TABLE Customer (CustomerID INT);
INSERT dbo.Customer(CustomerID)
VALUES (1);

GO
-- Alter the function to read from the table instead
ALTER FUNCTION GetCustomers()
RETURNS TABLE AS RETURN
    SELECT CustomerID 
    FROM dbo.Customer;
GO

-- This time the function will work as intended
SELECT * 
FROM GetCustomers() c
LEFT HASH JOIN dbo.Orders o ON c.CustomerID = o.CustomerID

C# - Fill DataTable using NBuilder.Net

posted Sep 30, 2015, 1:20 AM by Peter Henell   [ updated Sep 30, 2015, 1:21 AM ]

How do you easily generate some rows based on a given a set of Columns in a DataTable?
Each column of each row need to be of the correct datatype and each value of each row should be different.

This is one way to to it using NBuilder.
First define the columns in the DataTable somehow. 
--

var dt = new DataTable("Customer");
            dt.Columns.Add("ID", typeof(int));
            dt.Columns.Add("Name", typeof(string));
            dt.Columns.Add("BirthDate", typeof(DateTime));
--
Secondly, create a dynamic class - with one public property per column of the correct type in the datatable - which we can fill with data using NBuilder.
--

/// <summary>
    /// Based on
    /// http://stackoverflow.com/questions/3862226/dynamically-create-a-class-in-c-sharp
    /// </summary>
    public class DynamicTypeBuilder
    {
        private Type _createdType;

        public DynamicTypeBuilder(System.Data.DataTable schema)
        {
            this._createdType = CompileResultType(schema);
        }

        public dynamic CreateNewObject(System.Data.DataTable schema)
        {
            var entity = Activator.CreateInstance(_createdType);
            return entity;
        }

        public Type CompileResultType(System.Data.DataTable schema)
        {
            TypeBuilder tb = GetTypeBuilder(schema.TableName);
            
            ConstructorBuilder constructor = tb.DefineDefaultConstructor(MethodAttributes.Public 
                | MethodAttributes.SpecialName 
                | MethodAttributes.RTSpecialName);

            foreach (System.Data.DataColumn field in schema.Columns)
                CreateProperty(tb, field.ColumnName, field.DataType);
            
            Type objectType = tb.CreateType();
            return objectType;
        }

        private TypeBuilder GetTypeBuilder(string typeSignature)
        {
            var an = new AssemblyName(typeSignature);
            AssemblyBuilder assemblyBuilder = AppDomain.CurrentDomain.DefineDynamicAssembly(an, AssemblyBuilderAccess.Run);
            ModuleBuilder moduleBuilder = assemblyBuilder.DefineDynamicModule("MainModule");
            TypeBuilder tb = moduleBuilder.DefineType(typeSignature
                                , TypeAttributes.Public |
                                TypeAttributes.Sealed |
                                TypeAttributes.Class |
                                TypeAttributes.AutoClass |
                                TypeAttributes.AnsiClass |
                                TypeAttributes.BeforeFieldInit |
                                TypeAttributes.AutoLayout
                                , null);
            return tb;
        }

        private void CreateProperty(TypeBuilder tb, string propertyName, Type propertyType)
        {
            FieldBuilder fieldBuilder = tb.DefineField("_" + propertyName, propertyType, FieldAttributes.Private);

            PropertyBuilder propertyBuilder = tb.DefineProperty(propertyName, PropertyAttributes.HasDefault, propertyType, null);
            MethodBuilder getPropMthdBldr = tb.DefineMethod("get_" + propertyName
                                                , MethodAttributes.Public | 
                                                  MethodAttributes.SpecialName | 
                                                  MethodAttributes.HideBySig
                                                , propertyType
                                                , Type.EmptyTypes);

            ILGenerator getIl = getPropMthdBldr.GetILGenerator();

            getIl.Emit(OpCodes.Ldarg_0);
            getIl.Emit(OpCodes.Ldfld, fieldBuilder);
            getIl.Emit(OpCodes.Ret);

            MethodBuilder setPropMthdBldr =
                tb.DefineMethod("set_" + propertyName,
                  MethodAttributes.Public |
                  MethodAttributes.SpecialName |
                  MethodAttributes.HideBySig,
                  null, new[] { propertyType });

            ILGenerator setIl = setPropMthdBldr.GetILGenerator();
            Label modifyProperty = setIl.DefineLabel();
            Label exitSet = setIl.DefineLabel();

            setIl.MarkLabel(modifyProperty);
            setIl.Emit(OpCodes.Ldarg_0);
            setIl.Emit(OpCodes.Ldarg_1);
            setIl.Emit(OpCodes.Stfld, fieldBuilder);

            setIl.Emit(OpCodes.Nop);
            setIl.MarkLabel(exitSet);
            setIl.Emit(OpCodes.Ret);

            propertyBuilder.SetGetMethod(getPropMthdBldr);
            propertyBuilder.SetSetMethod(setPropMthdBldr);
        }
    }
--
Thirdly, add the values generated into the table.
--

private static void Fill(DataTable schema, int rows)
        {
            var builder = new DynamicTypeBuilder(schema);

            var o = builder.CreateNewObject(schema);
            Type t = o.GetType();
            var entities = GetListOfGeneratedObjects(o, rows);
            foreach (var entity in entities)
            {
                var row = schema.NewRow();
                foreach (var prop in t.GetProperties())
                {
                    row[prop.Name] = t.GetProperty(prop.Name).GetValue(entity);
                }
                schema.Rows.Add(row);
            }
        }

        public static IList<T> GetListOfGeneratedObjects<T>(T objectTemplate, int rows)
        {
            return Builder<T>.CreateListOfSize(rows).Build();
        }

-
Complete Code:
--
class Program
    {
        static void Main(string[] args)
        {
            var dt = new DataTable("Customer");
            dt.Columns.Add("ID", typeof(int));
            dt.Columns.Add("Name", typeof(string));
            dt.Columns.Add("BirthDate", typeof(DateTime));

            Fill(dt, 100);
            foreach (DataRow row in dt.Rows)
            {
                foreach (DataColumn col in dt.Columns)
                {
                    Console.WriteLine(col.ColumnName + ":" + row[col]);
                }
            }
        }

        private static void Fill(DataTable schema, int rows)
        {
            var builder = new DynamicTypeBuilder(schema);

            var o = builder.CreateNewObject(schema);
            Type t = o.GetType();
            var entities = GetListOfGeneratedObjects(o, rows);
            foreach (var entity in entities)
            {
                var row = schema.NewRow();
                foreach (var prop in t.GetProperties())
                {
                    row[prop.Name] = t.GetProperty(prop.Name).GetValue(entity);
                }
                schema.Rows.Add(row);
            }
        }

        public static IList<T> GetListOfGeneratedObjects<T>(T objectTemplate, int rows)
        {
            return Builder<T>.CreateListOfSize(rows).Build();
        }
    }
/// <summary>
    /// Based on
    /// http://stackoverflow.com/questions/3862226/dynamically-create-a-class-in-c-sharp
    /// </summary>
    public class DynamicTypeBuilder
    {
        private Type _createdType;

        public DynamicTypeBuilder(System.Data.DataTable schema)
        {
            this._createdType = CompileResultType(schema);
        }

        public dynamic CreateNewObject(System.Data.DataTable schema)
        {
            var entity = Activator.CreateInstance(_createdType);
            return entity;
        }

        public Type CompileResultType(System.Data.DataTable schema)
        {
            TypeBuilder tb = GetTypeBuilder(schema.TableName);
            
            ConstructorBuilder constructor = tb.DefineDefaultConstructor(MethodAttributes.Public 
                | MethodAttributes.SpecialName 
                | MethodAttributes.RTSpecialName);

            foreach (System.Data.DataColumn field in schema.Columns)
                CreateProperty(tb, field.ColumnName, field.DataType);
            
            Type objectType = tb.CreateType();
            return objectType;
        }

        private TypeBuilder GetTypeBuilder(string typeSignature)
        {
            var an = new AssemblyName(typeSignature);
            AssemblyBuilder assemblyBuilder = AppDomain.CurrentDomain.DefineDynamicAssembly(an, AssemblyBuilderAccess.Run);
            ModuleBuilder moduleBuilder = assemblyBuilder.DefineDynamicModule("MainModule");
            TypeBuilder tb = moduleBuilder.DefineType(typeSignature
                                , TypeAttributes.Public |
                                TypeAttributes.Sealed |
                                TypeAttributes.Class |
                                TypeAttributes.AutoClass |
                                TypeAttributes.AnsiClass |
                                TypeAttributes.BeforeFieldInit |
                                TypeAttributes.AutoLayout
                                , null);
            return tb;
        }

        private void CreateProperty(TypeBuilder tb, string propertyName, Type propertyType)
        {
            FieldBuilder fieldBuilder = tb.DefineField("_" + propertyName, propertyType, FieldAttributes.Private);

            PropertyBuilder propertyBuilder = tb.DefineProperty(propertyName, PropertyAttributes.HasDefault, propertyType, null);
            MethodBuilder getPropMthdBldr = tb.DefineMethod("get_" + propertyName
                                                , MethodAttributes.Public | 
                                                  MethodAttributes.SpecialName | 
                                                  MethodAttributes.HideBySig
                                                , propertyType
                                                , Type.EmptyTypes);

            ILGenerator getIl = getPropMthdBldr.GetILGenerator();

            getIl.Emit(OpCodes.Ldarg_0);
            getIl.Emit(OpCodes.Ldfld, fieldBuilder);
            getIl.Emit(OpCodes.Ret);

            MethodBuilder setPropMthdBldr =
                tb.DefineMethod("set_" + propertyName,
                  MethodAttributes.Public |
                  MethodAttributes.SpecialName |
                  MethodAttributes.HideBySig,
                  null, new[] { propertyType });

            ILGenerator setIl = setPropMthdBldr.GetILGenerator();
            Label modifyProperty = setIl.DefineLabel();
            Label exitSet = setIl.DefineLabel();

            setIl.MarkLabel(modifyProperty);
            setIl.Emit(OpCodes.Ldarg_0);
            setIl.Emit(OpCodes.Ldarg_1);
            setIl.Emit(OpCodes.Stfld, fieldBuilder);

            setIl.Emit(OpCodes.Nop);
            setIl.MarkLabel(exitSet);
            setIl.Emit(OpCodes.Ret);

            propertyBuilder.SetGetMethod(getPropMthdBldr);
            propertyBuilder.SetSetMethod(setPropMthdBldr);
        }
    }

Nant - Find File By Pattern

posted Sep 29, 2015, 11:27 PM by Peter Henell   [ updated Sep 29, 2015, 11:27 PM ]

This is a script snippet that returns the path to the first file found in ${directory} that matches ${pattern}

<script language="C#" prefix="files" >
    <code>
    <![CDATA[
        [Function("find-file")]
        public static string FindFirstFileInFolder(string startDir, string pattern)
        {
            // Returns first filename that matches the pattern.
            var files = Directory.GetFiles(startDir, pattern, SearchOption.AllDirectories);
            return files.Length > 0 ? files[0] : null;
        }
    ]]>
          </code>
  </script>

This examples finds the package zip file in the target folder and copies it to a specific package archive directory. 
<target name="copyZipToPackageFolder">
    <property name="pattern" value="*.zip" />
    <property name="target.folder" value="${directory::get-current-directory()}\target" />
    <property name="package.zip.path" value="${files::find-file(target.folder, pattern)}" />

    <copy file="${package.zip.path}" tofile="${archive.package.path}" />
  </target>

Windows 10 - Disable thumbs.db creation

posted Apr 9, 2015, 4:13 AM by Peter Henell   [ updated Apr 9, 2015, 4:14 AM ]

  1. Open notepad and enter the following. 
  2. Save the file as disable_thumbsdb.reg. 
  3. Doubleclick the file to run it. 
  4. You may need to restart your system for it to take effect.
Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\Software\Policies\Microsoft\Windows\CurrentVersion\Explorer\Advanced]
"DisableThumbnailCache"=dword:00000001
"DisableThumbsDBOnNetworkFolders"=dword:00000001

TSQL - Aliases for column names from Stored Procedure results

posted Apr 3, 2015, 1:46 PM by Peter Henell   [ updated Apr 3, 2015, 1:47 PM ]

In Sql Server 2012 there was a feature introduced called WITH RESULT SETS which allow (among other things) to put aliases on the result set of stored procedures.

This example will put aliases on all the columns returned from the stored procedure.
create procedure #testing as
begin
    select a, b, c from (values (1, 2, 3), (11, 22, 33)) as t(a, b, c);
end;
go

exec #testing
WITH RESULT SETS
(
 ( 
  [Amazing New Name for A]             NVARCHAR(100),
  [The B column is also renamed]  NVARCHAR(20),
  [The C column Renamed]         NVARCHAR(30)
 ) 
);

TSQL - Deadlock while setting database to MULTI_USER mode

posted Mar 25, 2015, 5:38 AM by Peter Henell   [ updated Mar 25, 2015, 5:42 AM ]

A strange issue when setting the database back to MULTI_USER mode.

Msg 1205, Level 13, State 68, Line 9

Transaction (Process ID 56) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Msg 5069, Level 16, State 1, Line 9

ALTER DATABASE statement failed.


Someone had snuck in and stolen our single available connection to the database.

This query was used to investigate who it was:
USE Master;
SELECT sd.name, sp.spid, sp.login_time, sp.loginame 
FROM sysprocesses sp 
INNER JOIN sysdatabases sd on sp.dbid = sd.dbid  
WHERE sd.name = 'YourPoorDatabase'
The query showed that was it was me, myself. I had a connection open somewhere.
I went right ahead and forced the database into multi_user mode.

I first try to use the WITH NO_WAIT option because I would prefer to not force a rollback in case my other connection was doing 
something meaningful.
When that did not work I had to force it with the WITH ROLLBACK IMMEDIATE;

The DEADLOCK_PRIORITY HIGH setting will make sure that we will survive the strange deadlock scenario.

The other connection would be picked as the deadlock victim.
USE [master];
SET DEADLOCK_PRIORITY HIGH;
ALTER DATABASE [YourPoorDatabase] SET MULTI_USER WITH NO_WAIT;
ALTER DATABASE [YourPoorDatabase] SET MULTI_USER WITH ROLLBACK IMMEDIATE;

1-10 of 230

Comments