About clukasik

clukasik · ‎11-28-2016

Hi @Garima Dosi If you look at the full yarn logs (i.e. 'yarn logs -applicationId <app id>'), do you see "Container killed on request" in the logs? The exit code may hold some clues. Best Regards, Craig 🙂

clukasik · ‎10-17-2016

Do you have the full stack trace? Also, what version of HDP/sqoop? Thanks

clukasik · ‎10-17-2016

This JIRA has some info about why you need to set the property: https://issues.apache.org/jira/browse/SQOOP-2910

clukasik · ‎10-06-2016

Add "WITH ROLLUP" or "WITH CUBE" to the end of the query, like: select country, gender,event, count(*), count(distinct userId)from TABLE groupby country, gender,event WITH ROLLUP or select country, gender,event, count(*), count(distinct userId)from TABLE groupby country, gender,event WITH CUBE

clukasik · ‎09-16-2016

Great to hear. Thanks for the feedback.

clukasik · ‎09-15-2016

Hive will combine input splits by default (https://issues.apache.org/jira/browse/HIVE-2245). Some tips on how to control this are here. Look at the mapred.max.split.size and mapred.min.split.size properties.

clukasik · ‎09-06-2016

Set hive.exec.compress.output to false. Your cluster may be configured for compression by default, with the default codec. Alternatively, keep the compressed output but view it with the "hdfs dfs -text <file path>" command.

clukasik · ‎09-05-2016

The Spark version will be displayed in the console log output...

clukasik · ‎09-05-2016

A good way to sanity check Spark is to start Spark shell with YARN (spark-shell --master yarn) and run something like this: val x = sc.textFile("some hdfs path to a text file or directory of text files") x.count() This will basically do a distributed line count. If that looks good, another sanity check is for Hive integration. Run spark-sql (spark-sql --master yarn) and try to query a table that you know can be queried via Hive.

clukasik · ‎08-17-2016

I noticed "Too many open files" so you might want to check your ulimit setting. Check here for guidance: https://community.hortonworks.com/questions/2029/best-practices-for-ulimits-number-of-open-file-dec.html

Online	Offline
Last Visited	‎03-27-2019 08:50 PM

Member Since	‎05-02-2016 01:27 PM
Last Visited	‎03-27-2019 08:50 PM
Posts	74
Kudos received	35

Cloudera Community

Re: Spark on Yarn: Do nodes need Spark installed?

Re: In sqoop import command i need to write query...

Re: Use case for Implementing Kerberos

Re: Is it needed Hive metaserver as well as Hive s...

Re: Combinational agg over multi-dimensional table

Re: Yarn containers get KILLED automatically. MapR...

Re: Sqoop Hive Import failing

Re: Sqoop Import: "-Dorg.apache.sqoop.splitter.all...

Re: Combinational agg over multi-dimensional table

Re: It's fine this number of maps?

Re: It's fine this number of maps?

Re: saveAsTextFile creates deflate file on the clu...

Re: How to check a correct install of spark? ( Whe...

Re: How to check a correct install of spark? ( Whe...

Re: Spark 1.6.0 in Scala on YARN with Kafka, Spark...