Tuesday, June 16, 2015

PDW to find out which table are external table

Normal source of getting all the table list in the database.
  1. INFORMATION_SCHEMA.TABLES
  2. sys.tables
  3. sys.objects
  4. sys.all_objects
  5. OBJECTPROPERTY ()
  6. OBJECTPROPERTYEX ()
However, none of above can tell you which table is external table. In order to find out which table are external table, you will have to look into sys.external_tables.

Query
SELECT *
FROM sys.external_tables

Wednesday, April 22, 2015

Why DelayValidation and ValidateExternalMetadata is important

i came across with a question of why setting "DelayValidation" and "ValidateExternalMetadata" are important.

For those who are new to SSIS. "DelayValidation" only can be found to all the component at Control Flow.




However "ValidateExternalMetadata" only can be found to all the component at Data Flow.


Now lets play with the setting with the monitoring tools SQL Profiler.

Scenario 1 - Execute package with default setting

Total number of prepare execution = 3.

Scenario 2 - Execute package with "DelayValidation" set to True.

Total number of prepare execution = 2.

Scenario 3 - Execute package with "DelayValidation" set to True and "ValidateExternalMetadata" to False

Total number of prepare execution = 1.

Apart from this, there are a huge drawback if the setting are leave default when the package are open from BIDS or SSDT. You may need to wait for decade for the package to be open as BIDS and SSDT will validate all the component in the package.

Take note to the square box bottom left of the picture.



Wednesday, October 29, 2014

NULL comparison

Today i had learn a pain lesson of comparing null value with a string.

Lets create a table with 2 column and 2 rows.
Query
DECLARE @Test TABLE(Col1 INT, Col2 INT)
INSERT INTO @Test VALUES (1,2),(2,NULL)

Output

The mistake i had done with following query; the original intention is to get all data except row no.2 which happen the be null at column 2 (filtering condition).

Query
SELECT *
FROM @Test
WHERE Col2 <> 2

Output

There are multiple way to address this issue however i would like to show why the comparison was correct in the way it is.
SELECT CASE WHEN 1 = NULL THEN 1
WHEN 1 <> NULL THEN 2
ELSE 3 END

As you can see, NULL is not comparable, it is not "True" or "False" but "Unknown". For more details please read this.

SET ANSI_NULLS OFF will not help in this scenario as it is not comparing null with null. If will only be helpful if the where clause like
1. WHERE Col2 = NULL
2. WHERE Col2 <> NULL

Solution:
1. Update the source column to other valid value before comparing.
2. Using COALESCE() or ISNULL() to replace null value to other valid value before comparing.
3. Address the null value in the where clause. (WHERE Col2 <> 2 OR Col2 IS NULL)

Tuesday, October 28, 2014

NOLOCK vs READPAST

Comparison of different kind of dirty-read.

Lets create a table with 5 rows of data to start the testing.
CREATE TABLE Test_Locking (Col1 int)
INSERT INTO Test_Locking VALUES (1),(2),(3),(4),(5)

Check the row count of the table before we start.
SELECT COUNT(*) FROM Test_Locking

Why sometime NOLOCK/READPAST is needed? You might had to read this.
A 5 row record table took more than 2 minute without returning any result.

We will be testing all of the DML (Data Manipulation Language) with OPEN transaction; without any commit and rollback activities.

For insert scenario:
Query
BEGIN TRANSACTION
       INSERT INTO Test_Locking VALUES (6)

Output

Execute "ROLLBACK TRANSACTION" after the result were shown.

For update scenario:
Query
BEGIN TRANSACTION
       UPDATE Test_Locking SET Col1 = 6 WHERE Col1 = 1

Output

Execute "ROLLBACK TRANSACTION" after the result were shown.

For delete scenario
BEGIN TRANSACTION
       DELETE Test_Locking WHERE Col1 = 1

Output

Execute "ROLLBACK TRANSACTION" after the result were shown.

Saturday, April 26, 2014

Display MSSQL PDW table rows by node, distribution, partition and COLUMNSTORE state


DBCC PDW_SHOWSPACEUSED () - Displays the number of rows, disk space reserved, and disk space used for a table in a SQL Server PDW database, or displays the disk space reserved and used for an entire SQL Server PDW database.

Advantages
- Useful to check data skew by distribution and node.

Disadvantages
- Not able to display data that sit on delta store, causing confusion that there are no record in the table.
- Not able to further filter.
- Multiple execution require for different table.

DBCC PDW_SHOWPARTITIONSTATS() - Displays the size and number of rows for each partition of a table in a SQL Server PDW database.

Advantages
- Display all row count regardless the data sit on delta store or column store.

Disadvantages
- Does not indicate how many rows of record sit on delta store.
- Not able to further filter.
- Multiple execution require for different table.

Query
SELECT
       Table_Name = OBJECT_NAME(pdw_table_mappings.object_id)
,      Table_Distribution = CASE WHEN pdw_table_distribution_properties.distribution_policy_desc = 'HASH' THEN 'DISTRIBUTE' ELSE pdw_table_distribution_properties.distribution_policy_desc END
,      Index_Type = indexes.type_desc
,      PDW_Node = pdw_nodes_partitions.pdw_node_id
,      Distribution = pdw_distributions.distribution_id
,      Partition_Number = pdw_nodes_partitions.partition_number
,      Partition_Position = CASE WHEN partition_functions.boundary_value_on_right = 1 THEN 'RIGHT' WHEN partition_functions.boundary_value_on_right = 0 THEN 'LEFT' END
,      Partition_Value = partition_range_values.value
,      Rows = CASE WHEN indexes.type IN (0,1) THEN
                           pdw_nodes_partitions.rows
                     ELSE
                           pdw_nodes_column_store_row_groups.total_rows
                     END
,      ColumnStore_State = pdw_nodes_column_store_row_groups.state_description

FROM   sys.pdw_table_mappings

              LEFT JOIN sys.indexes
              ON indexes.object_id = pdw_table_mappings.object_id
              AND indexes.type <> 2

              LEFT JOIN sys.pdw_table_distribution_properties
              ON pdw_table_distribution_properties.object_id = pdw_table_mappings.object_id
                          
              LEFT JOIN sys.pdw_nodes_tables
              ON pdw_nodes_tables.name = pdw_table_mappings.physical_name
             
              LEFT JOIN sys.pdw_distributions
              ON ((pdw_table_distribution_properties.distribution_policy = 2 AND pdw_distributions.pdw_node_id = pdw_nodes_tables.pdw_node_id) OR 1 = 2)
              AND ((pdw_table_distribution_properties.distribution_policy = 2 AND pdw_distributions.name = RIGHT(pdw_nodes_tables.name, 1)) OR 1 = 2)

              LEFT JOIN (   SELECT object_id, index_id, partition_number, pdw_node_id, rows = sum(rows)
                                  FROM sys.pdw_nodes_partitions
                                  GROUP BY object_id, index_id, partition_number, pdw_node_id)pdw_nodes_partitions
              ON pdw_nodes_partitions.object_id = pdw_nodes_tables.object_id
              AND pdw_nodes_partitions.pdw_node_id = pdw_nodes_tables.pdw_node_id
              AND pdw_nodes_partitions.index_id = indexes.index_id
             
              LEFT JOIN sys.partitions
              ON partitions.object_id = indexes.object_id
              AND partitions.index_id = indexes.index_id
              AND partitions.partition_number = pdw_nodes_partitions.partition_number
             
              LEFT JOIN sys.partition_schemes
              ON partition_schemes.data_space_id = indexes.data_space_id

              LEFT JOIN sys.partition_functions
              ON partition_functions.function_id = partition_schemes.function_id

              LEFT JOIN sys.partition_range_values
              ON partition_range_values.function_id = partition_functions.function_id
              AND COALESCE(partition_range_values.boundary_id + 1, 1) = partitions.partition_number   
                    
              LEFT JOIN (   SELECT object_id, index_id, partition_number, state_description, pdw_node_id, total_rows = SUM(total_rows)
                                  FROM sys.pdw_nodes_column_store_row_groups
                                  WHERE total_rows IS NOT NULL
                                  GROUP BY object_id, index_id, partition_number, state_description, pdw_node_id)pdw_nodes_column_store_row_groups
              ON pdw_nodes_column_store_row_groups.object_id = pdw_nodes_partitions.object_id
              AND pdw_nodes_column_store_row_groups.index_id = pdw_nodes_partitions.index_id
              AND pdw_nodes_column_store_row_groups.partition_number = pdw_nodes_partitions.partition_number
              AND pdw_nodes_column_store_row_groups.pdw_node_id = pdw_nodes_partitions.pdw_node_id

Output



Tuesday, April 15, 2014

Add 1 hour to datetime column

Add an hour to datetime column with decimal value

Query
DECLARE @DateTime DATETIME = DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0)

SELECT
      @DateTime
,     DATEADD(HOUR, 1, @DateTime)
,     @DateTime + 0.04166667

Output

Wednesday, February 12, 2014

SSIS Best Practice #1 - Avoid using drop & create table in SSIS

Out of sudden, one of my agent job has fail and the error message was

Description: Failed to open a fastload rowset for "[dbo].[************]". Check that the object exists in the database.

In a controlled production environment, no single one could drop a table without any testing and checking. So what causes the error?

Trace back to the starting failure point.

Time-out occurred while waiting for buffer latch type 2 for page (11:11042329), database ID 11

This error message was actually due to SQL Server was too busy to create table which later causes the table seems to be missing from SSIS point of view.

In this case, i had to create the table manually due to SSIS would validate the package before executing it.

After further understanding, the reason of using Drop and Create is to reset the identity column back to 1.

In the conclusion, i highly advise that try not to use Drop and Create method in SSIS, you may try using Truncate Table statement; which also will reset identity column.