Skip to content

Azkaban Job Type: HadoopJava

cjyu edited this page Mar 24, 2013 · 2 revisions

Introduction

Azkaban2 is a ground-up re-design of the old azkaban. One of the design goals is to make Azkaban robust and flexible. The job executors that actually run user jobs were in the way -- we had to upgrade the whole package for any changes in any job executor.

So in Azkaban2, the job executors are carved out to be plugin based. This way, we can add a lot of different job executor plugins as we want -- for hive, for pig, etc, and for different versions of them. We could also add job executors that work with different version of Hadoop without touching the core Azkaban2.

Here is a new job type that is introduced in Azkaban2:

HadoopJava

In large part, this is the same "java" type that was in the old azkaban. The difference is mainly in security. Below is how hadoop tokens work. If you only want to learn how to use it, jump to "How To Use" section.

Hadoop Tokens In the old azkaban java type, azkaban process hands out the Kerberos keytab information to the user process. The user program is wrapped by JavaRunnerMain, which uses that keytab information to login, gets tgt from KDC, and then proxy as the user. It is obviously dangerous for enterprise clusters in a company like LinkedIn.

In contrast, HadoopJava deals with hadoop security using hadoop tokens. Before the job is about to run, Azkaban2 checks to see which "hadoop user" the "Azkaban2 user" is trying to run as. Then Azkaban2 asks Name Node and Job Tracker for delegation tokens on behalf of the user. The token location is set in the user process. However, the hadoop jobClient does not pick up the token file directly. One has to explicitly set this in jobConf. If one wants to run a number of different mapreduce jobs, she also needs to make sure the tokens are not canceled upon completion of a mapreduce job.

Since the token is granted by Azkaban2 at the beginning of job run, the job will have to either obtain new tokens, renew the tokens, or finish before the tokens expire. For security purposes, the behavior is set to let the job token expire and not renew it.

How To Use

For the most part, one can use job packages that were working with the old java type and expect them to work with hadoopJava -- just change the job type in .job files. Azkaban2 should obtain the token and set it in your job conf.

However, in some libraries, the job conf is wiped clean upon start up and all the token information is lost.

In this case, one needs to add the following code after creating the new job conf:

(In this example, one gets job conf by creating new Configuration object)

                 Configuration conf = new Configuration();

                if (System.getenv("HADOOP_TOKEN_FILE_LOCATION") != null) {

                           conf.set("mapreduce.job.credentials.binary",      System.getenv("HADOOP_TOKEN_FILE_LOCATION"));
                }

Job Package Example:

see plugins/jobtype/examples/java-wc do zip java-wc ./* -r to get the zip package and upload to azkaban.

Clone this wiki locally