Member since
05-02-2016
74
Posts
41
Kudos Received
14
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3706 | 07-11-2018 01:40 PM | |
7490 | 01-05-2017 02:43 PM | |
1670 | 12-20-2016 01:17 PM | |
1524 | 12-02-2016 07:19 PM | |
2335 | 10-06-2016 01:29 PM |
11-28-2016
03:48 PM
1 Kudo
Hi @Garima Dosi If you look at the full yarn logs (i.e. 'yarn logs -applicationId <app id>'), do you see "Container killed on request" in the logs? The exit code may hold some clues. Best Regards, Craig 🙂
... View more
10-17-2016
12:43 PM
Do you have the full stack trace? Also, what version of HDP/sqoop? Thanks
... View more
10-17-2016
12:40 PM
1 Kudo
This JIRA has some info about why you need to set the property: https://issues.apache.org/jira/browse/SQOOP-2910
... View more
10-06-2016
01:29 PM
Add "WITH ROLLUP" or "WITH CUBE" to the end of the query, like: select country, gender,event, count(*), count(distinct userId)from TABLE groupby country, gender,event WITH ROLLUP or select country, gender,event, count(*), count(distinct userId)from TABLE groupby country, gender,event WITH CUBE
... View more
09-15-2016
04:49 PM
2 Kudos
Hive will combine input splits by default (https://issues.apache.org/jira/browse/HIVE-2245). Some tips on how to control this are here. Look at the mapred.max.split.size and mapred.min.split.size properties.
... View more
09-06-2016
02:39 PM
2 Kudos
Set hive.exec.compress.output to false. Your cluster may be configured for compression by default, with the default codec. Alternatively, keep the compressed output but view it with the "hdfs dfs -text <file path>" command.
... View more
09-05-2016
02:57 PM
The Spark version will be displayed in the console log output...
... View more
09-05-2016
02:57 PM
A good way to sanity check Spark is to start Spark shell with YARN (spark-shell --master yarn) and run something like this: val x = sc.textFile("some hdfs path to a text file or directory of text files")
x.count() This will basically do a distributed line count. If that looks good, another sanity check is for Hive integration. Run spark-sql (spark-sql --master yarn) and try to query a table that you know can be queried via Hive.
... View more
08-17-2016
07:31 PM
1 Kudo
I noticed "Too many open files" so you might want to check your ulimit setting. Check here for guidance: https://community.hortonworks.com/questions/2029/best-practices-for-ulimits-number-of-open-file-dec.html
... View more