Friday, January 16, 2009

Data truncation with OBIEE Applications 7.9.5 OOB ETLs

This post is applicable only for OBIEE 7.9.5 applications configured to run with data movement type as UNICODE. After the recent upgrade to 7.9.5 (Informatica 8.1), in development, some ETLs (Bank, GL Account, Employee and User dimensions) started failing with either Unique constraint error or duplicate keys found. After looking at data, it was clear that OBIEE is getting loaded with duplicate values in INTEGRATION_ID column causing duplicate keys found error while creating unique index, which is a composite key on columns - DATASOURCE_NUM_ID, INTEGRATION_ID and EFFECTIVE_FROM_DT.

When I exported the source data and stage table data to a spreadsheet and started ro by row comparison with VLOOKUPs in Excel, it was clear that Informatica while loading stage table, it either truncated a character in INTEGRATION_ID or added some junk characters in some VARCHAR columns. Because the truncated values in INTEGRATION_ID matched already existing values, duplicate keys found error was thrown.

After raising with support and spending some time on it, I figured that it is Default Buffer Block Size setting in Informatica that is causing the issue. Defaukt Buffer Block size is configured as 1280000 in all of the ETL sessions that are failing. I reverted the value to 128000 and the sessions started to run with no truncation and no junk characters.

Informatica support directed me to the Performance tuning guide, which has details on 2 very important session level parameters (Default Buffer Block Size and DTM Buffer Size), confuguring which without understanding the implications can lead to data loss without even realizing the problem. I am putting dowm the explanation in brief.

Informatica Integration Service process starts the Data Transformation Manager (DTM) when you start a session and DTM allocates buffer memory to the session at runtime based on the DTM Buffer Size setting in the session properties. The DTM divides the memory into buffer blocks as configured in the Default Buffer Block Size setting in the session properties. The reader, transformation, and writer threads use buffer blocks to move data from sources to targets. The buffer block size should be larger than the precision for the largest row of data in a source or target.

The Integration Service allocates at least 2 buffer blocks for each source and target partition. Use the following calculation to determine buffer block requirements:
[(total number of sources + total number of targets)* 2] = (session buffer blocks)


Session Buffer Blocks = No. of Partitions * (DTM Buffer Size / Buffer Block Size)

In OOB ETLs, Buffer Block Size is increased to 1280000 leaving DTM Buffer Size as 32000000. Session Buffer Blocks = (32000000/1280000), which is 25 blocks and that seem to be not enough to move data from source to taget, which case it should have failed rather than truncating characters. But, my first test with 128000 for Default Buffer Block size worked, which resulted 250 Memory Blocks and this seem good enough to move data. I was not sure why this fixed but support directed me to right document.

Here is some additional information that may be useful.

You configure buffer memory settings by adjusting the following session parameters:

DTM Buffer Size. The DTM buffer size specifies the amount of buffer memory the Integration Service uses when the DTM processes a session. Configure the DTM buffer size on the Properties tab in the session properties.

Default Buffer Block Size. The buffer block size specifies the amount of buffer memory used to move a block of data from the source to the target. Configure the buffer block size on the Config Object tab in the session properties.

Refer to Power Center Workflow Administration Guide or Performance Tuning Guide for more detailed explanation.

2 comments:

Pran said...

good work!!

Anonymous said...

Hi...Really helped me... I was getting the issue with w_inventory_product_ds table....where last characters were getting truncated...thanks a lot for your findings...and posting them here...