Solution
Recovery cache file names use the following format:
pmgmd_[sourcestate][sessionstate][targetstate]_
Example
pmgmd_sourcestate_50fc0313_9762_4670_98e9_7e2e5b7a93b8_268_5_191_61_0_0_1_1.dat
In this example:
50fc0313_9762_4670_98e9_7e2e5b7a93b8 is the Repository GUID
268 is the workflow ID
5 is the folder ID
191 is the session ID
61 is the transformation ID
To determine the name of the workflow use the workflow ID in the following query:
SELECT WORKFLOW_NAME FROM REP_WORKFLOWS WHERE WORKFLOW_ID=268 AND SUBJECT_ID=5
To determine the name of the session and folder use the session ID in the following query:
SELECT SESSION_NAME, SUBJECT_AREA FROM REP_LOAD_SESSIONS WHERE SESSION_ID=191 AND SUBJECT_ID=5
Tuesday, December 20, 2011
How to Determine the PowerCenter session that creates the PMGMD cache file
How to Determine the process associated with an Integration Service or Repository Service
Solution
If there are multiple Integration Services enabled under one node, there are multiple pmserver processes running on the same machine. To do determine the process associated with an Integration Service or Repository Service do the following:
- Log on to the Administration Console Click on Logs > Display Settings.
- Add Process to the list of columns to be displayed in the Log Viewer.
- Refresh the log display.
- Use the PID from this column to identify the process as follows:
UNIX Run the following command:
ps -ef grep pid Where pid is the process ID of the service process.
WindowsRun Task Manager.
Select the Processes tab. Scroll to the value in the PID column that is displayed in the PowerCenter Administration Console.
PowerCenter session always runs on a different node than the one selected
Problem
There are two nodes (node01 and node02 ) running on a grid. The session must be run on node01 because that is where the source file is located. Selecting node01 in the Resources option on the General tab does not force the session to run on node01, rather it randomly chooses both nodes for running session, sometimes node01, other times node02. Is there a setting to always run session on node01?
Cause
This behavior occurs when the IgnoreResourceRequirement Integration Service property is enabled.
Solution
To resolve this set IgnoreResourceRequirement to "No" as follows:
- In the Administration Console, select the Integration Service.
- Under the Properties tab select Advanced Properties > Edit .
- Clear the IgnoreResourceRequirements property.
- Restart the Integration Service.
- Run the workflow.
An Overview of Session Recovery in PowerCenter
Content
Recovery allows the user to restart a failed session and complete it as if the session had run without pause.When the Integration Service runs a session in recovery mode, it continues to commit data from the point of the last successfulcommit.This article explains the process of recovery and is aimed at users who will be incorporating this feature for recovering their sessions.
Requirements for Recovery Run
For a session to run in recovery mode, the following criteria need to be satisfied.Otherwise an attempt to perform a recovery session run will fail.
- The session is enabled for recovery. A Recovery Strategy (under Edit Session >Properties > General Options) must beselected for every individual session task that needs to be recovered (in previous versions it the Enable Recovery option wasunder Config Object tab in Session Properties).Sessions upgraded from a previous version of PowerCenter have recovery disabled by default.
- The workflow/session is run in recovery mode. This can be done using the workflow run options Recover Task or RecoverWorkflow From Task. These options are available in Workflow Monitor or Workflow Manager or using pmcmd.
- The previous session run did not succeed. This includes normal and recovery mode session runs.
- The number of partitions is not changed between the normal and recovery run.
- The recovery table created at the target database is not dropped or changed by the user.
- The OS platform is not changed between normal and recovery run.
Recovery Process
When you run a session enabled for recovery, the Integration Service creates and stores session status information in recoverytables and cache files. If the session fails and you recover the session, the Integration Service uses this status information todetermine where the previous session failed and where to continue the session. The type of recovery the Integration Serviceperforms depends on the target type. You can also choose to run part of the workflow when you recover the failed session.
Recovery Tables
When you enable recovery for a session that writes to a relational target, the Integration Service creates two tables at the targetdatabase, PM_RECOVERY and PM_TGT_RUN_ID. The Integration Service updates these tables during regular session runs with statusinformation about the target load. When the Integration Service runs a failed session in recovery mode, it uses the information in therecovery tables to determine where the previous session failed, and then continues the session from that point. Do not edit or deleteinformation in the recovery tables.
PM_RECOVERY
This table records the target load information during the session run per target instance per partition. In previous releases, this tablewas OPB_SRVR_RECOVERY. PM_RECOVERY saves the following information: - Repository ID. This allows multiple repositories to use the same recovery table.
- Workflow ID
- Folder ID
- Session instance ID
- Workflow run ID
- Target instance ID. The Integration Service creates one row per target instance
- Partition ID. The Integration Service creates one row per partition
- Row count. Stores the last committed number of rows
- Check point. Used with GMD sources only.
PM_TGT_RUN_ID
PM_TGT_RUN_ID is for internal use. It stores the latest unique number the Integration Service uses to identify the target instance.This table has a single field, LAST_TGT_RUN_ID.
Recovery Cache Files
As a result of GMD integration, the Integration Service also creates a pmgmd cache file in $PMCacheDir when a normal session that isenabled for recovery fails. The Integration Service creates the cache file in the following format:
pmgmd_metadata_ _ _ _ _ _ _ _ _ .dat
Here is a sample cache file name:pmgmd_metadata_7661f424_379f_11d7_947e_f63b53abfef7_103_2_102_0_0_0_1_1.dat
- The Integration Service creates cache files for both relational and non-relational targets. The session will fail if either the recoverytables or the pmgmd cache files cannot be created due to insufficient privileges.When a failed session is run in recovery mode, the Integration Service uses the pmgmd cache files when it recovers the session. Donot edit or delete the cache files before the recovery run.
Recovery Types
Internally, recovery can be divided into two types: incremental recovery and full load recovery.In incremental load recovery, the Integration Service uses the ROW_COUNT information in the PM_RECOVERY table to determine thenumber of rows committed to the target. When the session is recovered, previously loaded rows will be skipped. Thus, incrementalrecovery ensures that there are no duplicates loaded to the target. The Integration Service performs incremental recovery forrelational, normal load targets.In full load recovery, data is loaded from the beginning during the recovery run. The Integration Service performs full load recoveryfor relational bulk load, flat file (including regular, MQSeries, and external loader), XML, and SAP BW targets. Recovery for nonrelationaltargets is done using the pmgmd file created at the $PMCacheDir cache directory.
Recovery and Truncation
During the recovery session run, the Integration Service ignores the Truncate Target Table option for relational normal load targets,since the load will continue from the last commit point. For relational bulk load targets the Integration Service allows the TruncateTarget Table option in both normal and recovery run.
At the end of a successful session run the Integration Service deletes all recovery information in the target database and in$PMCacheDir. The Integration Service always resets the recovery information when it runs the session normally, regardless ofwhether the previous session run succeeded or failed.
Recovery Scenarios
Recovery can be performed under the following scenarios: - The workflow suspends. If the workflow suspends due to session failure, you can recover the failed session and resume theworkflow.
- The workflow fails. If the workflow fails as a result of session failure, you can recover the session and run the rest of theworkflow.
- The workflow completes, but a session fails. If the workflow completes, but a session fails, you can recover the session alonewithout running the rest of the workflow.
- Recover Task. Use this option when you want to recover a single session.
- Recover Workflow From Task. Use this option when you want to recover the session and continue the workflow.The following options are available in the Workflow Monitor
- Resume/Recover. Use this option to resume a suspended workflow in recovery mode.
- Recover Workflow From Task. Use this option when you want to recover the session and continue the workflow.
- Recover Task. Use this option when you want to recover a single session.
You can also use one of the following pmcmd commands to run the session in recovery mode when you specify -recovery
- pmcmd startworkflow
- pmcmd resumeworkflow
- pmcmd resumeworklet
- pmcmd starttask
Unrecoverable Sessions
PowerCenter does not support session recovery in the following cases: - The session uses non-pass-through partitioning. All partition points must be pass-through partitioning unless the partition pointpasses data to an Aggregator or Rank transformation. If Enable Recovery is enabled in the session properties, you can onlyselect the default partition types for each partition point.
- The session uses database partitioning. PowerCenter does not support recovery for a session that uses database partitioning.
- You want to debug the mapping. You cannot perform recovery from the Debugger.
- You want to perform a test load. You cannot perform a test load if the session is enabled for recovery. If you enable recovery,the Workflow Manager disables the Test Load option.
- The Integration Service runs in Unicode mode and you change the session sort order.
- The Integration Service code page or source and target code pages change after the initial session failure.
- Data movement mode changes after the initial session failure.
- The session performs incremental aggregation and the Integration Service stops unexpectedly.
- The mapping uses a Sequence Generator transformation.
- The mapping uses a Normalizer transformation.
- The sources or targets change after the initial session failure.
- The mapping uses a lookup transformation that performs a lookup on the target table, or the data in the lookup table changesbetween session runs.
- An XML/VSAM source generates keys that the Integration Service uses to populate the target.
- You edited the mapping in a way that causes the Integration Service to distribute data differently.
Monday, April 14, 2008
Informatica ETL Power Center 8
PowerCenter 8 comes with options for expanding the capabilities of the ETL tool, these are a selection that impact how ETL jobs execute:
Partitioning Option: for partitioning data to process it in parallel across multiple CPUs or servers.
Real Time Option: for turning PowerCenter jobs into real time services.
Enterprise Grid Option: for automating the deployment and load balancing of PowerCenter jobs on a grid
In 2003 Informatica launched the SuperGlue metadata management solution for additional metadata capabilities. It is a web based solution for assembling, visualizing and analyzing metadata from various enterprise systems. In 2005 Informatica launched PowerCenter Advanced Edition that came bundled with SuperGlue. With the release of PowerCenter 8 Advanced Edition the SuperGlue name is gone and the product is now known as Metadata Manager and is only available in this product bundle.
Informatica offers a PowerCenter option called Metadata Exchange that uses a wizard driven interface to import source definitions into PowerCenter. It provides more descriptive metadata and some metadata synchronization functions for when source metadata definitions change. It offers a combination of bi-directional and uni-directional exchange of metadata to products such as ERwin, Oracle Designer, Cognos Impromptu and Business Objects Designer.
Informatica relies on additional third party vendors to deliver address validation capabilities to Data Quality. These products are licensed separately through Informatica.
There is also PowerCenter Data Cleanse and Match option that was built by Informatica prior to the Similarity acquisition. It offers data cleansing and parsing and data matching and is fully integrated with PowerCenter to seemlessly use the Data Partitioning and Enterprise Grid capabilities for scalable processing.
Data Explorer provides data profiling functions to discover and define the content, structure and quality of data. It provides data mapping capabilities for source to target mapping that can be accessed from within PowerCenter.
PowerCenter also has a Mapping Generation Option that uses the free Informatica Data Stencil design tool for Microsoft Visio that generates PowerCenter mappings from Visio designs or reverse engineers a PowerCenter mapping into a Visio mapping template.
Sunday, April 13, 2008
Unix Scripts and Common Used Commands
FTP Shell script
Bourne Shell script that connects to a given FTP server and executes a few commands (e.g: download and delete a file) and disconnects.This script also assumes that you have the nessecary permissions.
#!/bin/sh
#These can be easily converted to be read from input
#or as command line params
FTP_SRVR=myftp #or ip
FTP_USR=my_login
FTP_PWD=my_passwd #not secure (but use a tied down login)
#change this to do some generate filename trick
#this is just something out of the blue
TODAYS_FILE=`date cut -c12-19`
ftp <
user $FTP_USR $FTP_PWD
binary
get $TODAYS_FILE
delete $TODAYS_FILE
bye
EOF