Azure Data Lake
This article needs additional citations for verification. (October 2017) |
Developer(s) | Microsoft |
---|---|
Initial release | November 16, 2016 |
Available in | English |
Type | Data storage and analytics service |
Website | azure |
Azure Data Lake[1] is a scalable data storage and analytics service. The service is hosted in Azure, Microsoft's public cloud.
History
[edit]Azure Data Lake service was released on November 16, 2016. It is based on COSMOS,[2] which is used to store and process data for applications such as Azure, AdCenter, Bing, MSN, Skype and Windows Live. COSMOS features a SQL-like query engine called SCOPE upon which U-SQL was built.[2]
Storage
[edit]Data Lake Storage is a cloud service to store structured, semi-structured or unstructured data produced from applications including social networks, relational data, sensors, videos, web apps, mobile or desktop devices. A single account can store trillions[3] of files where a single file can be greater than a petabyte in size.
Analytics
[edit]Data Lake Analytics is a parallel on-demand job service. The parallel processing system is based on Microsoft Dryad.[4] Dryad can represent arbitrary Directed Acyclic Graphs (DAGs) of computation. Data Lake Analytics provides a distributed infrastructure that can dynamically allocate resources so that customers pay for only the services they use. The system uses Apache YARN, the part of Apache Hadoop which governs resource management across clusters. Data Lake Store supports any application that uses the Hadoop Distributed File System (HDFS) interface.[4]
U-SQL
[edit]U-SQL is a query language for Data Lake Analytics parallel data transformation and processing programs. It combines SQL and C#: it is and an evolution of the declarative SQL language with native extensibility through user code written in C#. U-SQL uses C# data types and the C# expression language.
Retirement
[edit]In 2021, Microsoft announced the 2024 retirement of the original Azure Data Lake Storage, now called "Gen1". The related Azure Data Lake Analytics / U-SQL technologies are also being retired.[5] Azure Data Lake Storage Gen2, an extension of Azure Storage, will continue.[6] The suggested replacement technologies are Azure Synapse Analytics and Apache Spark.[7]
See also
[edit]References
[edit]- ^ "Data Lake". Microsoft Azure. Retrieved 2019-06-17.
- ^ a b Harris, Derrick (2015-02-05). "Why opening up its Cosmos big data system would be the right move for Microsoft". gigaom.com. Retrieved 2017-07-27.
- ^ "Data Lake | Microsoft Azure". azure.microsoft.com. Retrieved 2021-09-15.
- ^ a b Harris, Ed. "Cosmos" (PDF).
- ^ "Azure Data Lake Analytics will be retired on 29 February 2024". Microsoft Azure. Retrieved 2023-12-07.
- ^ "Retirement Announcement - Azure Data Lake Storage Gen1". Microsoft Azure. Retrieved 2023-12-07.
- ^ "Migrate Azure Data Lake Analytics to Azure Synapse Analytics". Microsoft Azure. Retrieved 2023-12-07.