Unfolding Hive Lateral View: A Detailed Guide

Apache Hive is an open-source data warehousing solution built on top of Hadoop that allows for data summarization, query, and analysis. It employs a SQL-like language called HiveQL to enable querying and managing large datasets residing in distributed storage. One of the key features of Hive is the ability to create Lateral Views, which is a powerful tool for handling complex data types. This blog post will offer an in-depth understanding of Lateral Views in Hive.

What is a Lateral View in Hive?

link to this section

Lateral View is used in conjunction with user-defined table generating functions such as explode() . It applies the UDTF to each row of base table and then joins resulting output rows to the original input rows to form a virtual table having each input row and corresponding output row(s).

Datathreads Advertisement - On-Premise ETL,BI, and AI Platform

When to Use Lateral View?

link to this section

Lateral View is extremely useful when you have complex data types such as arrays and maps to deal with.

Datathreads Advertisement - On-Premise ETL,BI, and AI Platform

Explode Function

link to this section

The explode() function in Hive is often used with Lateral View. The explode function takes in an array or a map as an input and outputs the elements of the array (map) as separate rows.

Let's consider an example:

CREATE TABLE student (name STRING, scores ARRAY<INT>); 
INSERT INTO student VALUES ('John', array(90, 85, 88)); 

If we want to convert each score into a separate row, we can use the explode function with Lateral View as follows:

SELECT name, score 
FROM student 
LATERAL VIEW explode(scores) s AS score; 

The result would be:

John 90 
John 85 
John 88 


Inline Function

link to this section

Another function often used with Lateral View is inline() , which takes an array of structs as input and outputs a table with each struct as a row.

Datathreads Advertisement - On-Premise ETL,BI, and AI Platform

Multiple Lateral Views

link to this section

Multiple Lateral Views can be used in the same SQL statement, and they can also be used in the same SQL statement with Lateral View and other table joins.

Conclusion

link to this section

Lateral View in Hive is a powerful tool for handling complex data types, making it easier to flatten nested data structures and work with individual elements. While it may seem complicated initially, understanding how to use Lateral View effectively can significantly enhance your data querying and analysis capabilities in Hive.

Remember, practice makes perfect. So, try out different scenarios and data types with Lateral View to get a firm grasp of this feature. Happy Hiving!