Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Wy data is not being profiled using waterline?

Expert Contributor

Hi,

I have download Sandbox machine for waterlinedata and I am getting waterline UI with per-loaded/per-profiled data everything is working fine(default).

Now I want to profile some files which are present under /user/waterlinedata/newStaggingData directory.After copying from local to HDFS I am running command ./waterline profileOnly /user/waterlinedata/newStaggingData and now accoding to my knowledge profiling is nothing but identify file format,calculate data quality matrics and store all details in inventory etc. but am not able to see such details in front of my files within waterline UI.

Please attached images.capture1.png

I know that after executing above command waterline runs map-reduce job and I am sure that it's running perfectly but still not getting any fileformat/data quality metrics in UI.

otherwise send me the steps to for how to profile data which are present inside of particular directory.

Thanks in advance.

1 ACCEPTED SOLUTION

Hi @Manoj Dhake,

Glad to seeing you using the Waterline sandbox. You have the command right: waterline profileOnly <HDFS dir>.

To see what happened when you ran the command, you can get a very high-level look from the Dashboard page in the Waterline UI. If that shows that the job ran successfully, you might try refreshing the browser page (a hard refresh such as a command-r or control-r might be needed). If the job doesn't appear in the dashboard or it shows that it failed, you can look for more details in the jobs log: /var/log/waterline/wdi-inventory.log.

Because the sandboxes are pretty limited, it may be that the job is running out of resources. The wdi-inventory.log and corresponding MapReduce logs should make that clear if its the case.

Carol Drummond

Waterline Data Technical Support

View solution in original post

2 REPLIES 2

Hi @Manoj Dhake,

Glad to seeing you using the Waterline sandbox. You have the command right: waterline profileOnly <HDFS dir>.

To see what happened when you ran the command, you can get a very high-level look from the Dashboard page in the Waterline UI. If that shows that the job ran successfully, you might try refreshing the browser page (a hard refresh such as a command-r or control-r might be needed). If the job doesn't appear in the dashboard or it shows that it failed, you can look for more details in the jobs log: /var/log/waterline/wdi-inventory.log.

Because the sandboxes are pretty limited, it may be that the job is running out of resources. The wdi-inventory.log and corresponding MapReduce logs should make that clear if its the case.

Carol Drummond

Waterline Data Technical Support

Expert Contributor

Hi guys,

I was running waterline jobs such as(profile job,tag job,lineage job) but while running that map-reduce code I was getting exceptions "Permission Denied error" on some waterline data directory so I resolved them by using

sudo -u waterlinedata hadoop fs -chmod 777 <directory name>

and everything is worked fine.