Informatica Developers and Administrator Tips

Tuesday, December 20, 2011

How to Determine the PowerCenter session that creates the PMGMD cache file

Solution
Recovery cache file names use the following format:
pmgmd_[sourcestate][sessionstate][targetstate]_______.dat
Example
pmgmd_sourcestate_50fc0313_9762_4670_98e9_7e2e5b7a93b8_268_5_191_61_0_0_1_1.dat
In this example:
50fc0313_9762_4670_98e9_7e2e5b7a93b8 is the Repository GUID
268 is the workflow ID
5 is the folder ID
191 is the session ID
61 is the transformation ID
To determine the name of the workflow use the workflow ID in the following query:

SELECT WORKFLOW_NAME FROM REP_WORKFLOWS WHERE WORKFLOW_ID=268 AND SUBJECT_ID=5
To determine the name of the session and folder use the session ID in the following query:

SELECT SESSION_NAME, SUBJECT_AREA FROM REP_LOAD_SESSIONS WHERE SESSION_ID=191 AND SUBJECT_ID=5

How to Determine the process associated with an Integration Service or Repository Service

Solution

If there are multiple Integration Services enabled under one node, there are multiple pmserver processes running on the same machine. To do determine the process associated with an Integration Service or Repository Service do the following:

Log on to the Administration Console Click on Logs > Display Settings.

Add Process to the list of columns to be displayed in the Log Viewer.

Refresh the log display.

Use the PID from this column to identify the process as follows:

UNIX Run the following command:
ps -ef grep pid Where pid is the process ID of the service process.
WindowsRun Task Manager.

Select the Processes tab. Scroll to the value in the PID column that is displayed in the PowerCenter Administration Console.

PowerCenter session always runs on a different node than the one selected

Problem
There are two nodes (node01 and node02 ) running on a grid. The session must be run on node01 because that is where the source file is located. Selecting node01 in the Resources option on the General tab does not force the session to run on node01, rather it randomly chooses both nodes for running session, sometimes node01, other times node02. Is there a setting to always run session on node01?
Cause
This behavior occurs when the IgnoreResourceRequirement Integration Service property is enabled.
Solution

To resolve this set IgnoreResourceRequirement to "No" as follows:

In the Administration Console, select the Integration Service.

Under the Properties tab select Advanced Properties > Edit .

Clear the IgnoreResourceRequirements property.

Restart the Integration Service.

Run the workflow.

An Overview of Session Recovery in PowerCenter

Content
Recovery allows the user to restart a failed session and complete it as if the session had run without pause.When the Integration Service runs a session in recovery mode, it continues to commit data from the point of the last successfulcommit.This article explains the process of recovery and is aimed at users who will be incorporating this feature for recovering their sessions.
Requirements for Recovery Run
For a session to run in recovery mode, the following criteria need to be satisfied.Otherwise an attempt to perform a recovery session run will fail.

The session is enabled for recovery. A Recovery Strategy (under Edit Session >Properties > General Options) must beselected for every individual session task that needs to be recovered (in previous versions it the Enable Recovery option wasunder Config Object tab in Session Properties).Sessions upgraded from a previous version of PowerCenter have recovery disabled by default.

The workflow/session is run in recovery mode. This can be done using the workflow run options Recover Task or RecoverWorkflow From Task. These options are available in Workflow Monitor or Workflow Manager or using pmcmd.

The previous session run did not succeed. This includes normal and recovery mode session runs.

The number of partitions is not changed between the normal and recovery run.

The recovery table created at the target database is not dropped or changed by the user.

The OS platform is not changed between normal and recovery run.
Recovery Process
When you run a session enabled for recovery, the Integration Service creates and stores session status information in recoverytables and cache files. If the session fails and you recover the session, the Integration Service uses this status information todetermine where the previous session failed and where to continue the session. The type of recovery the Integration Serviceperforms depends on the target type. You can also choose to run part of the workflow when you recover the failed session.

Recovery Tables
When you enable recovery for a session that writes to a relational target, the Integration Service creates two tables at the targetdatabase, PM_RECOVERY and PM_TGT_RUN_ID. The Integration Service updates these tables during regular session runs with statusinformation about the target load. When the Integration Service runs a failed session in recovery mode, it uses the information in therecovery tables to determine where the previous session failed, and then continues the session from that point. Do not edit or deleteinformation in the recovery tables.
PM_RECOVERY
This table records the target load information during the session run per target instance per partition. In previous releases, this tablewas OPB_SRVR_RECOVERY. PM_RECOVERY saves the following information:

Repository ID. This allows multiple repositories to use the same recovery table.

Workflow ID

Folder ID

Session instance ID

Workflow run ID

Target instance ID. The Integration Service creates one row per target instance

Partition ID. The Integration Service creates one row per partition

Row count. Stores the last committed number of rows

Check point. Used with GMD sources only.
PM_TGT_RUN_ID
PM_TGT_RUN_ID is for internal use. It stores the latest unique number the Integration Service uses to identify the target instance.This table has a single field, LAST_TGT_RUN_ID.
Recovery Cache Files
As a result of GMD integration, the Integration Service also creates a pmgmd cache file in $PMCacheDir when a normal session that isenabled for recovery fails. The Integration Service creates the cache file in the following format:

pmgmd_metadata_ _ _ _ _ _ _ _ _ .dat

Here is a sample cache file name:
pmgmd_metadata_7661f424_379f_11d7_947e_f63b53abfef7_103_2_102_0_0_0_1_1.dat

The Integration Service creates cache files for both relational and non-relational targets. The session will fail if either the recoverytables or the pmgmd cache files cannot be created due to insufficient privileges.When a failed session is run in recovery mode, the Integration Service uses the pmgmd cache files when it recovers the session. Donot edit or delete the cache files before the recovery run.
Recovery Types
Internally, recovery can be divided into two types: incremental recovery and full load recovery.In incremental load recovery, the Integration Service uses the ROW_COUNT information in the PM_RECOVERY table to determine thenumber of rows committed to the target. When the session is recovered, previously loaded rows will be skipped. Thus, incrementalrecovery ensures that there are no duplicates loaded to the target. The Integration Service performs incremental recovery forrelational, normal load targets.In full load recovery, data is loaded from the beginning during the recovery run. The Integration Service performs full load recoveryfor relational bulk load, flat file (including regular, MQSeries, and external loader), XML, and SAP BW targets. Recovery for nonrelationaltargets is done using the pmgmd file created at the $PMCacheDir cache directory.
Recovery and Truncation
During the recovery session run, the Integration Service ignores the Truncate Target Table option for relational normal load targets,since the load will continue from the last commit point. For relational bulk load targets the Integration Service allows the TruncateTarget Table option in both normal and recovery run.
At the end of a successful session run the Integration Service deletes all recovery information in the target database and in$PMCacheDir. The Integration Service always resets the recovery information when it runs the session normally, regardless ofwhether the previous session run succeeded or failed.
Recovery Scenarios
Recovery can be performed under the following scenarios:

The workflow suspends. If the workflow suspends due to session failure, you can recover the failed session and resume theworkflow.

The workflow fails. If the workflow fails as a result of session failure, you can recover the session and run the rest of theworkflow.

The workflow completes, but a session fails. If the workflow completes, but a session fails, you can recover the session alonewithout running the rest of the workflow.

In order to run a recovery session, the following options are available in the Workflow Manager:

Recover Task. Use this option when you want to recover a single session.

Recover Workflow From Task. Use this option when you want to recover the session and continue the workflow.The following options are available in the Workflow Monitor

Resume/Recover. Use this option to resume a suspended workflow in recovery mode.

Recover Workflow From Task. Use this option when you want to recover the session and continue the workflow.

Recover Task. Use this option when you want to recover a single session.

You can also use one of the following pmcmd commands to run the session in recovery mode when you specify -recovery

pmcmd startworkflow

pmcmd resumeworkflow

pmcmd resumeworklet

pmcmd starttask
Unrecoverable Sessions
PowerCenter does not support session recovery in the following cases:

The session uses non-pass-through partitioning. All partition points must be pass-through partitioning unless the partition pointpasses data to an Aggregator or Rank transformation. If Enable Recovery is enabled in the session properties, you can onlyselect the default partition types for each partition point.

The session uses database partitioning. PowerCenter does not support recovery for a session that uses database partitioning.

You want to debug the mapping. You cannot perform recovery from the Debugger.

You want to perform a test load. You cannot perform a test load if the session is enabled for recovery. If you enable recovery,the Workflow Manager disables the Test Load option.

In general, it is safe to assume a changed(edited and saved) mapping or session will result in an unrecoverable session or anindeterministic data load, depending on the extent of the changes. To perform consistent data recovery, the session properties for therecovery session must be the same as the session properties for the normal failed session. You might get inconsistent data if youperform recovery under the following circumstances:

The Integration Service runs in Unicode mode and you change the session sort order.

The Integration Service code page or source and target code pages change after the initial session failure.

Data movement mode changes after the initial session failure.

The session performs incremental aggregation and the Integration Service stops unexpectedly.

The mapping uses a Sequence Generator transformation.

The mapping uses a Normalizer transformation.

The sources or targets change after the initial session failure.

The mapping uses a lookup transformation that performs a lookup on the target table, or the data in the lookup table changesbetween session runs.

An XML/VSAM source generates keys that the Integration Service uses to populate the target.

You edited the mapping in a way that causes the Integration Service to distribute data differently.

Monday, April 14, 2008

Informatica ETL Power Center 8

Extract, Transform and Load

The Informatica ETL product is called PowerCenter. It was one of the earliest ETL tools on the market and uses a user-friendly GUI designer interface to build jobs that run on one or more transformation servers. In PowerCenter 8 the product has pushdown optimization that executes transformation steps on a database server. This gives the an ETLT capability - extract, transform, load and transform.
PowerCenter 8 comes with options for expanding the capabilities of the ETL tool, these are a selection that impact how ETL jobs execute:
Partitioning Option: for partitioning data to process it in parallel across multiple CPUs or servers.
Real Time Option: for turning PowerCenter jobs into real time services.
Enterprise Grid Option: for automating the deployment and load balancing of PowerCenter jobs on a grid

Metadata Management

Informatica has some technical metadata functions and reports in the PowerCenter product to support ETL development. In addition Informatica has the Metadata Manager and the Metadata Exchange Option.
In 2003 Informatica launched the SuperGlue metadata management solution for additional metadata capabilities. It is a web based solution for assembling, visualizing and analyzing metadata from various enterprise systems. In 2005 Informatica launched PowerCenter Advanced Edition that came bundled with SuperGlue. With the release of PowerCenter 8 Advanced Edition the SuperGlue name is gone and the product is now known as Metadata Manager and is only available in this product bundle.
Informatica offers a PowerCenter option called Metadata Exchange that uses a wizard driven interface to import source definitions into PowerCenter. It provides more descriptive metadata and some metadata synchronization functions for when source metadata definitions change. It offers a combination of bi-directional and uni-directional exchange of metadata to products such as ERwin, Oracle Designer, Cognos Impromptu and Business Objects Designer.

Data Quality

In January of 2006 Informatica acquired Similarity Systems to considerably enhance its data quality and data profile functionality. The Similarity ATHANOR product has been renamed to Informatica Data Quality. It offers data cleansing, data matching and data quality reporting and monitoring. nformatica Data Quality processes can be added to a PowerCenter job within the PowerCenter Mapping Designer by selecting a data quality building block.
Informatica relies on additional third party vendors to deliver address validation capabilities to Data Quality. These products are licensed separately through Informatica.
There is also PowerCenter Data Cleanse and Match option that was built by Informatica prior to the Similarity acquisition. It offers data cleansing and parsing and data matching and is fully integrated with PowerCenter to seemlessly use the Data Partitioning and Enterprise Grid capabilities for scalable processing.

Data Profiling

In 2005 Similarity Systems acquired Evoke software to add the Evoke Axio profiling tool to the data quality suite and gain access to a wider customer base in the US and Asia. Similarity was in turn acquired by Informatica in January of 2006. Informatica renamed Evoke Axio to Information Data Explorer and offers it as a stand alone product or as a companion to PowerCenter.
Data Explorer provides data profiling functions to discover and define the content, structure and quality of data. It provides data mapping capabilities for source to target mapping that can be accessed from within PowerCenter.
PowerCenter also has a Mapping Generation Option that uses the free Informatica Data Stencil design tool for Microsoft Visio that generates PowerCenter mappings from Visio designs or reverse engineers a PowerCenter mapping into a Visio mapping template.

Sunday, April 13, 2008

Unix Scripts and Common Used Commands

FTP Shell script

Bourne Shell script that connects to a given FTP server and executes a few commands (e.g: download and delete a file) and disconnects.This script also assumes that you have the nessecary permissions.

#!/bin/sh
#These can be easily converted to be read from input
#or as command line params
FTP_SRVR=myftp #or ip
FTP_USR=my_login
FTP_PWD=my_passwd #not secure (but use a tied down login)

#change this to do some generate filename trick
#this is just something out of the blue
TODAYS_FILE=`date cut -c12-19`

ftp <open $FTP_SRVR
user $FTP_USR $FTP_PWD
binary
get $TODAYS_FILE
delete $TODAYS_FILE
bye
EOF