Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pig Statement its taking a long time

avatar
Rising Star

I hive 45 text files with 5 columns and I'm using Pig to add a new column to each file based on it filename.

First question: I upload all the files into HDFS manually. Do you think is a better option upload a compress file?

Second question: I put my code bellow. In your opinion it is the best way to add a new column to my files?

I submit this code and it taking hours processing... All of my files are in Data directory...

Data = LOAD '/user/data' using PigStorage(' ','-tagFile')

 

STORE DATA INTO '/user/data/Data_Transformation/SourceFiles' USING PigStorage(' ');


Thanks!!!

3 ACCEPTED SOLUTIONS

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Rising Star

I'll need to install notebook to use Spark and Python (there exists any tutorial to do that?). After that I think I will use your idea 🙂

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login