Thursday, December 29, 2011

Parallelism of scenarios

Lets take a deep dive into the parallelism capability of ODI using StartScen and Asynchronous mode.
Here is a nice article from Christophe (http://blogs.oracle.com/dataintegration/entry/parallel_processing_in_odi) that illustrates how this can be achieved.
However, the important thing to understand here is that the job kick-off process is not truly parallel.

A truly parallel process should look like the following where the Start will spawn 3 parallel processes.

Whereas in ODI, the above behavior is mimicked by connecting jobs sequentially to each other and then executing them in Asynchronous mode.

Does the question arise - What's the difference as long as the jobs are executing in parallel ? A slight difference.
Consider a scenario when the scenario "StartEmpLoad" cannot start due to some reason (Scenario not available or Agent selected for the Scenario is down). In that case, the next (or did we say parallel) job "StartSalesLoad" will never be kicked off. Since both of them are connected with an "OK" and since the first job wasnt OK, the flow never reaches the next process. This error is a job kick-off error and is different from the data errors that would normally be encountered during a job execution (Duplicate keys, Data not found, etc).

Each StartScen in here kicks off a job and then moves on to the next StartScen. Being Async jobs, even though they dont wait for the results to come back, they do make sure that the job was kicked off successfully (vs completed successfully).

Wednesday, December 28, 2011

ODI Tools without Hardcodings

ODI provides us with a set of utilities that can assist us in day to day tasks such as Ftp, FileCopy, SendMail, SQLUnload, etc. These tools make ODI powerful.
These tools are available in the Toolbox (once you start creating a package).

A common problem of all the tools is the need of hardcoding file paths, user names, passwords, etc. This is the least desirable of any tool. As a developer you will want the flexibility to change the parameters whenever the need arises. Unfortunately, the way the tools are laid out this change is not very intuitive.
It seems that the values need to be changed and then the scenarios need to be recompiled.

Wait ... Did I say recompile the scenarios for changing a parameter ? Ohh. Such a basic problem.
C'Mon. Its better to write programs in C where the parameters can be passed from the Command line using argv[].

How did such an excellent tool miss on this one ?
Ok. Ok. Lets explore a couple of ways in which the life can be simplified.

  1. Use them as Sunopsis API - Each tool can be used as a Sunopsis API in the ODI Procedure. Its very easy to see the syntax of the corresponding API for each tool. Once you drag-drop the tool inside a package, in the Properties Explorer click on the "Command" tab. As you keep adding parameters on the "General" tab, the "Command" tab keeps getting updated with the values. ODIexperts specify an excellent way of invoking the ODI tools inside an ODI procedure - http://odiexperts.com/calling-odisqlunload-using-odi-procedure-with-no-hardcoded-password

  2. Use ODI variables - Define an ODI Refresh variable for each of the parameter that you want to parameterize. Before invoking the tool, refresh the ODI variable from a database configuration table. Use this ODI variable in the parameters section of the tool. The following snapshots will illustrates how this can be done.


  3. This method will mandate that the package contains several variables. This may make the package look too full with too many connections.

  4. Use Jython Variables - This method is an enhancement to the method mentioned above. The only difference being the use of Jython variables instead of ODI variables and a bit more :D . The real advantage of this approach is the less cluttered ODI package as all the variables being used are created in the ODI procedure.
    This will be a 2 step process:
    • Create and Refresh all the Jython variables in an ODI procedure.
    • Use those variables in parameters of ODI Tool inside the package.

    In the example that is provided here, the odiSQLUnload tool is used and the Jython variables are being used for the Connection information. The procedure that will contain the commands to declare and populate the jython variables will look as follows:


    Then the package will be created where this procedure will preceed the ODI tool OdiSqlUnload.

    The OdiSqlUnload tool's command should be configured to look similar to this:

    OdiSqlUnload "-FILE=/MyPath/sqlunloadfile.txt" "-DRIVER=<@=jvJDBC@>" "-URL=<@=jvURL@>" "-USER=<@=jvUser@>" "-PASS=<@=jvPass@>" "-FILE_FORMAT=VARIABLE" "-FIELD_SEP=," "-ROW_SEP=\r\n" "-DATE_FORMAT=yyyy/MM/dd HH:mm:ss" "-CHARSET_ENCODING=ISO8859_1" "-XML_CHARSET_ENCODING=ISO-8859-1" "-FETCH_SIZE=5000"
    select emp_no from emp
The above method illustrates an effective usage of Jython variables.
In a similar fashion other values can be refreshed from a DB configuration table and then used inside the ODI tools.

Hope this helps in laying a path towards cleaner usage of ODI tools without hardcoding.

Wednesday, April 20, 2011

When Temporary is not Temporary

Lets have a little questionnaire about the temporary interfaces that are used in Oracle Data Integrator. They are also popularly known as "Yellow Interfaces" because of the Yellow icon that ODI gives them at creation as opposed to the Blue icon of a regular interface.



Which of the following statements are false about Temporary interfaces:
  1. The tables get dropped once the calling interface is complete.
  2. The data in the temporary interfaces is deleted once the calling interface is complete.
  3. The table referred to as Target in the Temporary interface is created each time the calling interface invokes it.
  4. They can only be executed in Sunopsis Memory Engine.

Answer Choices:
1 is false
2 is false
3 is false
1 and 2 are false
1 and 3 are false
2 and 4 are false
All of the above



The answer is "All of the above" statements are false.

I will elucidate upon each of the above points and remove common misconceptions:
  1. The table that gets created stays there forever. It is a physical table that exists.
  2. The data in the temporary interface also stays there. It has to be removed manually or overwritten. Even though, this sounds similar to Global temporary tables in Oracle, it is not.
  3. This is dependent upon how you have configured your interface. If the option "CREATE_TARG_TABLE" is set to "Yes", then ODI will attempt to create this table each time. Otherwise, it will NOT. If the table already exists, the statement to create the table will raise an Error (Warning).
  4. The fact that you create a Temporary interface using Sunopsis Memory Engine doesnt mean that it cannot be created in the RDBMS using are working with. You can also create the temporary interfaces in the database of your choice. The fact of the temporary interface being in Memory is coincidental to the idea of a temporary datastore. If the Sunopsis Memory Engine is chosen as the "Work Schema", then the temporary table will remain there untill the Engine is rebooted/shutdown or an explicit code to drop that table is issued.

A very important statement -
There is nothing temporary about a temporary interface, but its name
.

The term "temporary" comes into picture because the temporary area (or the staging area) can be purged after executions. Though, they have to be purged manually or in a process.
The advantage of using a Yellow interface is ONLY to avoid the actual preparation of the destination datastore in the database or creation of this datastore in the "Model". Since, these are deemed to be temporary, theoretically, they are not part of our data model and hence, dont need to be in the "Model".

In all cases, the destination data is written to a database (either in memory of RDBMS). This depends on what staging area has been chosen.

Typically, you will want to create a temporary interface that gets invoked as a source in another permanent (blue) interface. A yellow interface may be invoked as a source in more then one blue interface.
In this case, each of the blue interfaces will share the same table (populated by yellow interface). So, if you are trying to run those jobs in parallel, then you may want to redesign the usage of yellow interface.

The temporary Interface, has an option to choose its datastore location, being: Data Schema or Work Schema. If the Work Schema is chosen and for StagingArea the Memory Engine then it will remain there until the Engine is shutdown or an explicit code to drop the table is issued.

Wednesday, February 23, 2011

ODI startscen.sh forks local agent for a scenario

I will discuss the behavior of odi startscen.sh/bat command. Often, we use this command to execute a scenario from our custom shell/batch scripts or might even add it to our workflow engine.
The syntax being:
startscen.sh scn_load_data 001 GLOBAL -NAME=odiSchedAgent001

However, there is an important point to know the underlying functioning of the startscen command. This command forks out a new agent process that is local to the machine on which the command was started.
The startscen.sh has a parameter -NAME=<agentName> which naturally makes us assume that this agent will be used to execute the scenario. However, this parameter is just used to store the agent name in the logs of the operator. It has nothing to do with using that agent to execute the scenario.
Another obvious side effect is that the OS execution privileges of the user who initiated the startscen command are used to generate log files or any other OS files, which may be different from the execution privileges of the remote agent that you intended to use.

Consider a small test case:

(1) Use Unix account "A" to start a background agent process by agentscheduler.sh.

agentscheduler.sh -PORT=20910 -NAME=odiSchedAgent001

(2) On the same Unix machine, use Unix account "B" to execute the scenario "scn_load_data" startscen.sh

startscen.sh scn_load_data 001 GLOBAL -NAME=odiSchedAgent001

However, in step (2), it would fork a new agent process to execute the scenario instead of using the existing agent process started in (1). You can verify this by scanning the processes.
On Unix, you can check that using "ps -ef" and see that a new process would be initiated.

Even if you use same Unix account "A" to execute startscen.sh, it would still fork a new agent process.

This may be a problem. As I said earlier - that the permissions of the user who initiated the agent vs the user who initiated the startscen command would differ.
However, there are more cases in which this will be a Bigger problem.
Consider a scenario in which you have 2 Unix systems - System-A and System-B. System-A is running the agent process and the System-B is being used to kick-off the scenario using startscen command. If there are any OS based activities done by the scenario, then they would be done on the file-system of System-B, whereas you wanted those to done on System-A (after all thats where the agent is running. Or so you thought)

Now the Good part... The Solution :
Use the startcmd.sh instead of startscen.sh
The syntax is
startcmd.sh OdiStartScen -SCEN_NAME=scn_load_data -SCEN_VERSION=001 -CONTEXT=GLOBAL -AGENT_CODE=odiSchedAgent001
This command actually uses the remote agent specified in the parameter.

Tuesday, February 1, 2011

ODI 11g Upgrade issues - Set operator disabled

In this post, I will be consolidating all the issues that I had with an ODI 11g upgrade from ODI 10g. This is strictly valid for ODI 11g instances which have been migrated from ODI 10g.

1.)
During the upgrade from 10g to 11g the upgrade, if you forget to upgrade the Technologies, you will find that the SET based functionality doesnt work. Thanks to Nicolas for pointing this out.
In an interface mapping, you cannot add multiple datasets. The Add/Delete buttons for the datasets are disabled.



The cause -
The Oracle Technology was not upgraded to support these new features. So, you will see the following in the Topology Manager -> Physical Architecture -> Technologies -> Oracle


The options highlighted above are unchecked. Hence, the functionalities of Partitioning, Native Sequences, Lookups, Derived Tables and Set operators is not available.

Solution -
You will need to make changes to the Oracle Technology as follows: