BI and Cloud

Wednesday, October 10, 2012

Time Difference in Hours in Teradata

In Tearadata time difference calculation is bit tricky compared to SQL Command DateDiff(HH, Date1, Date2)

So here is Syntax/ Code for calculating Time difference in Hours between Today's date and EventDate

Syntax:

SELECT

CAST( CURRENT_DATE AS TIMESTAMP(6)) -- Today's Date

, EventDate -- Date you want to check with

,(CAST(EventDate AS TIMESTAMP(6) )- CAST(CURRENT_DATE AS TIMESTAMP(6) )) DAY(4) TO MINUTE AS DHM -- Time difference in Days Hours:Minutes

,24* EXTRACT(DAY FROM (DHM)) -- Hours calculation by Days

,EXTRACT(HOUR FROM (DHM)) -- extracting extra hours from apart from complete day

,24* EXTRACT(DAY FROM (DHM)) + EXTRACT(HOUR FROM (DHM)) AS HoursDiff -- Exact Time difference in Hours

FROM Database.TableName

Friday, October 5, 2012

Usage of NULL and NOT Operator

The word NULL is a reserved word in any SQL. As you know NULL represents nothing, more accurately missing data or empty cell values. A comparison with NULL always returns NULL. To find out NULL values we can either have a condition with a logical operator with two NOT operators or we can use COALESE function.

Usage of NOT operator

It is a wondering truth that sometimes in life, we do NOT want something. And to satisfy this need in SQL, there is a NOT operator provided. Just prefix this before any comparison operator and you will get what you do NOT want. Here is the syntax.

SELECT {*|<col1>}, <col2>, <col3>…

FROM <TABLE name>

WHERE <COLUMN name> NOT <OPERATOR> <another COLUMN name OR constant VALUE>

IN, ALL, ANY Oerator in Teradata

IN operator to have multiple values in the ‘WHERE’ clause. This is an alternative use of having multiple OR. Here is the syntax.

SELECT {*|<col1>}, <col2>, <col3>…

FROM <TABLE name>

WHERE <COLUMN name> [NOT] IN (<list OF VALUES>)

For example, to select employees from a list of department numbers, use the following SQL.

SELECT first_name, last_name

FROM employee

WHERE dept_no IN (100,250,368)

Remember that a result set would be returned even if there is only on matching dept_no found. As mentioned before if you have a NULL operator in the list then there would be no rows returned. To find out rows not matching the list of values, prefix the NOT operator before IN as follows;

SELECT first_name, last_name

FROM employee

WHERE dept_no NOT IN (210,652,555,871)

If you want to use another alternative to this, you can go for ANY, but this has to be used with a comparison operator as follows.

SELECT {*|<col1>}, <col2>, <col3>…

FROM <TABLE name>

WHERE <COLUMN name> = ANY (<list OF VALUES>)

To negate you have to swap the word with ALL and prefix with the NOT operator before the equal sign,

i.e. NOT = ALL <list of values>

Rank vs Row_Number in Teradata

Rank vs Row_Number in Teradata

Anyone working on Teradata would be aware of the two most common OLAP functions in Teradata which are used for common use i.e.

1. Rank()

2. Row_Number()

In one of my posts I have mentioned about how to generate sequence number (ex. Surrogate Key) using Row_Number. Today let us find out the difference in usage of rank and row number functions by taking following scenario on tables (ProductSales)

SELECT Product, DateOfSale, SalesAmount,

RANK() OVER (ORDER BY SalesAmount DESC) , -- RANK function

ROW_NUMBER() OVER (ORDER BY SalesAmount DESC) -- ROW_NUMBER function

FROM ProductSales

The result set is as follows:

In the result set, we can observe the following,

1. Rows 1 and 2 have same rank assigned since that have same SalesAmount value

2. Rows 1 and 2 have different row numbers assigned even though same SalesAmount value

3. Row 3 has rank “3” assigned as ranking positions

Hence , rows are always assigned an incremented sequence number with ROW_NUMBER.

Rows are always assigned the same number with RANK function.

List of tables and columns in Teradata database

How to extract list of columns, Tables in database (using Teradata SQL)
Syntax:

SELECT DatabaseName, TableName, ColumnName, ColumnFormat, ColumnType, ColumnLength, Nullable

,(CASE

WHEN columntype='CV' THEN 'VARCHAR'

WHEN columntype='DA' THEN 'DATE'

WHEN columntype='I' THEN 'INTEGER'

WHEN columntype='CF' THEN 'CHAR'

WHEN columntype='F' THEN 'FLOAT'

WHEN columntype='I1' THEN 'BYTEINT'

WHEN columntype='I2' THEN 'SMALLINT'

WHEN columntype='I8' THEN 'BIGINT'

WHEN columntype='BO' THEN 'BLOB'

WHEN columntype='CO' THEN 'CLOB'

WHEN columntype='TS' THEN 'TIMESTAMP'

WHEN columntype='BF' THEN 'BYTE'

WHEN columntype='SZ' THEN 'TIMESTAMP W/ZONE'

WHEN columntype='BV' THEN 'VARBYTE'

WHEN columntype='DM' THEN 'INVERVAL DAY TO MINUTE'

WHEN columntype='D' THEN 'DECIMAL'

ELSE columntype END)

AS DataType

FROM DBC.COLUMNS

WHERE databasename = 'DatabaseName'

Wednesday, June 8, 2011

Unpivot Transformation with Multiple Destination Columns

The Unpivot transformation makes an unnormalized dataset into a more normalized version by expanding values from multiple columns in a single record into multiple records with the same values in a single column. For example, a dataset that lists customer names has one row for each customer, with the products and the Qty purchased shown in columns in the row. After the Unpivot transformation normalizes the data set, the data set contains a different row for each product that the customer purchased.

The need to unpivot data during an ETL operation is quite common. One method to perform the unpivoting is writing the source query in such a way that the data starts off unpivoted using the UNPIVOT T-SQL statement in SQL Server. This however would obviously not work out on a source that does not support unpivoting such as a text file or Office Excel or even older versions of popular RDBMSs. The other method of course is to use the Unpivot Transformation in Integration Services.

The Unpivot Transformation is quite a simple component, especially if you need to unpivot a bunch of columns to a single destination column. Unpivoting to multple destination columns however, seems a tad challenging the first time round, at least for me it did. Until I figured out how to. This post is all about unpivoting a columns to multiple destination columns.

Consider this scenario. A simple Product table (which I shall call Source this point forward):

Needs to be transformed into the following Product table (which I shall call Destination this point forward):

Configuring the Unpivot transformation for this would be quite simple, and would look like this:

The problem starts when you need the Destination table looking like this, with multiple destination columns (Qty and Cost):

You add the Cost columns as well (as in the image below) to the Unpivot transformation,

and get the following validation error (encountered by a lot of people):

PivotKeyValue is not valid. In an UnPivot transform with more than one unpivoted DestinationColumn, the set of PivotKeyValues per destination must match exactly.

The reason is this. If you look closely at the Pivot Key Value column of the Unpivot transformation you would notice that for the Qty destination column, the set of values are ‘Qty1’, ‘Qty2’ and ‘Qty3’. Whereas, for the Cost destination column, the set of values are ‘Cost1’, ‘Cost2’ and ‘Cost3’, which are clearly not matching. The solution to the problem here is quite trivial. Change the default values of the Pivot Key Value column of the offending destination column to match the values of the other destination columns. Like this:

And its Done....

Instead of Qty1, Qty2, Qty3 give "Pivot Key Value" as 1, 2, 3.