Integrating Cloudera cluster with Active Directory (Part 3/3)

Paige Liu
In Part 1 and Part 2 of this blog, we covered the first 5 steps, here we will describe the remaining Cloudera specific steps to enable Kerberos and Single-Sign-On for web consoles.
  1. Deploy Active Directory with HA in Azure
  2. Deploy Linux VMs for the Cloudera cluster
  3. Enable Active Directory DNS on the Linux VMs
  4. Sync Linux VMs to Active Directory time service
  5. Join the Linux VMs to Active Directory and enable Single-Sign-On
  6. Install Cloudera
  7. Enable Kerberos on Cloudera
  8. Enable Single-Sign-On for Cloudera web consoles

Step 6: Install Cloudera

By the end of this step, we should have Cloudera bits installed on the Linux VMs.  We should be able to access Cloudera Manager console using the Cloudera Manager admin specified during installation.
1. Install Cloudera using the template azuredeploy_postad.json in this GitHub repo.  The following parameters must match what was created or modified in previous steps.
    • adminUserName: this could be the AD sudo user if the default user created with the VM has been disabled
    • adminPassword
    • dnsNamePrefix
    • adDomainName
    • nodeAddressPrefix
    • numberOfDataNodes
    • region
    • tshirtSize
2. Verify Cloudera is installed correctly by RDP into a VM within the same VNet, open a browser, and access http://<dnsNamePrefix>-mn0.<adDomainName>:7180.  Use the Cloudera Manager admin credential specified in the template parameters.

Step 7: Enable Kerberos on Cloudera

Follow the steps outlined in this Cloudera documentation to enable Kerberos on Cloudera with AD.  Most of the steps are fairly clear in the documentation, however, the following steps may need more details:
1. Since we deployed Cent OS VMs, the default Kerberos encryption type is AES, so JCE policy file is required.  The simplest way to install this file is to go to Cloudera Manager -> Hosts -> Re-run Upgrade Wizard -> check Install JDK and JCE policy file.  Note that this will revert some configuration values to default.  Note down these values, and after the installation, revert them back.
2. We already installed OpenLdap client library on all Linux VMs.
3. Specify Kerberos encryption the same as the output of "klist -e" on a Linux VM:
4. When importing KDC account manager credentials, Cloudera Manager issues LDAP requests over SSL.  We need to enable LDAP over SSL on AD.
  • Add server role "Certificate Authority" to the PDC.  After installation, complete configuration with default options.
  • Run mmc on PDC, add "Certificates" snapin, Computer Account -> Local Computer -> Personal -> Certificates -> All Tasks -> Request New Certificate to request a certificate for Kerberos authentication.  You may need to restart both PDC and BDC after this change.
5. If we created an Organizational Unit (OU) in AD, the credential used to generate other credentials must be granted delegation rights to administer user accounts in that OU:
6. Complete the wizard.  Now that we enabled Kerberos, if we run the following commands on a Cloudera VM, we should get a security error:
sudo su hdfs
hdfs dfs -ls / 
//should display security error

7. Create the hadoop superuser hdfs in AD in the same NIS domain.  ssh in as hdfs@<domain name>, run the above command again, it should succeed. 8. Create hadoop users in AD in the same NIS domain, create their home directory with hdfs, then log in as a hadoop user, and run mapreduce job.

//log in as hdfs, create home directory for each hadoop user
hdfs dfs -mkdir /user/alice
hdfs dfs -chown alice /user/alice
 
//log in as a hadoop user, for example, then run mapreduce
hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 10000 

Step 8: Enable Single-Sign-On for Cloudera web consoles

  1. Follow the Cloudera documentation to enable Single-Sign-On using AD credentials. Once enabled, it will prompt for user credential when we open, for example, Yarn ResourceManager Web UI.  We need to provide the fully qualified user name, for example, someone@bigdata.com.
  2. To enable AD authentication for Cloudera Manager console, configure External Authentication:

Note that users must be explicitly added to the AD groups specified here, for example,

Restart Cloudera Manager server:

service cloudera-scm-server restart
All done. In summary, we started from scratch, created an AD forest, deployed a Cloudera cluster, enabled DNS and joined the Cloudera cluster VMs to the AD.  Finally we enabled authentication on Cloudera and web consoles using the credentials managed by AD.