1. Go to Cloudera Manager, select HDFS, then Configuration, Search for "core-site", and add the following configuration to Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml, replace with your storage account name and key:
Name: fs.azure.account.key.<your_storage_account>.blob.core.windows.net
Value: <your_storage_access_key>
2. Redeploy stale client configurations, then restart all Cloudera services from Cloudera Manager.
3. To test that Cloudera can access files in Azure storage, put some files in Azure storage. You can do so by using command line tools like AzCopy, or UI tools such as Visual Studio Server Explorer or Azure Storage Explorer.
4. SSH into any Cloudera node, run the following command. You may see some warnings, but make sure your can see the files in your Azure storage account. Note that if you don't specify a destination folder name, you must have the trailing slash in the wasb URL, as shown in the following example:
hdfs dfs -ls wasb://<your_container>@<your_account>.blob.core.windows.net/
5. Run distcp on any Cloudera node to copy data from HDFS to Azure Storage.
# Run this command under a user who has access to the source HDFS files,
# for example, the HDFS superuser hdfs
hadoop distcp /<hdfs_src> wasb://<your_container>@<your_account>.blob.core.windows.net/
Now you should be able to see the source HDFS content showing up in Azure storage:
For more information about Hadoop support for Azure storage, please see this documentation.