Azure Data Lake Storage Gen2
Azure Data Lake Storage Gen2 takes core capabilities from Azure Data Lake Storage Gen1 such as a Hadoop compatible file system, Azure Active Directory and POSIX based ACLs and integrates them into Azure Blob Storage. This combination enables best in class analytics performance along with Blob Storage’s tiering and data lifecycle management capabilities and the fundamental availability, security and durability capabilities of Azure Storage. The simplest way to enable Hadoop applications to work with cloud storage is to build a Hadoop File System driver which runs client-side within the Hadoop applications. This driver emulates a file system, converting Hadoop file system operations into operations on the backend of the respective platforms. This is often inefficient, and correctness is hard to implement/achieve. For instance, when implemented on an object store a request by a Hadoop client to rename a folder (a common operation in Hadoop jobs) can result in many REST requests. This is...