Member since
05-10-2016
97
Posts
19
Kudos Received
13
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3008 | 06-13-2017 09:20 AM | |
9021 | 02-02-2017 06:34 AM | |
3944 | 12-26-2016 12:36 PM | |
2468 | 12-26-2016 12:34 PM | |
50734 | 12-22-2016 05:32 AM |
07-27-2018
08:24 AM
1 Kudo
As this question has already been marked resolved and you are looking for python examples instead of pyspark, you may want to ask in a new question. But, you may also want to look at the various python libraries that already implement functionality to access HDFS data.
... View more
10-11-2017
01:18 PM
Can you verify that shuffle auxilary service is enabled within YARN?
... View more
06-22-2017
01:41 PM
1 Kudo
You still need the spark-streaming dependency, but instead of version 2.1.1 you will want to match your spark core version of 1.6.3.
... View more
06-15-2017
06:37 AM
1 Kudo
Why are you trying to connect to Impala via JDBC and write the data? You can write the data directly to the storage through Spark and still access through Impala after calling "refresh <table>" in impala. This will avoid the issues you are having and should be more performant.
... View more
06-13-2017
09:20 AM
Hi Sidhartha, It appears you are using a newer version of spark-streaming (2.1.1). The spark streaming twitter includes spark streaming 1.6.3 and you are using spark 1.6, this may be causing conflicts. There is no need to include the spark-streaming dependency as it will be pulled in with the spark streaming twitter as a transitive dependency. Jason
... View more
06-13-2017
07:59 AM
Hi Msdhan, What's the schema and fileformat of the Impala table? Why not write the data directly and avoid a jdbc connection to impala? Jason
... View more
02-02-2017
06:34 AM
1 Kudo
This is currently an issue with Numeric datatypes. This is resolved in 2.0, but you can work around the issue by casting to Varchar or importing data into an RDD then converting to DataFrame.
... View more
01-03-2017
08:15 AM
1 Kudo
Yes, this would typically not be recommened, but is a work around for a bug. This is fixed in CM 5.9, so when using 5.9 and newer, you should not need to disable parcel relation validation.
... View more
12-26-2016
01:26 PM
No problem. The name is a bit misleading, but 5.7 is the minimum version required, installing that parcel won't be a problem with 5.9. the requirements section[1] has a bit more information on supported versions. 1. http://www.cloudera.com/documentation/spark2/latest/topics/spark2_requirements.html
... View more
12-26-2016
12:36 PM
Spark 2.0 is available as a parcel as well, so you shouldn't need to move to packages unless you have another reason. Spark 2.0 is out of beta now and is GA. Here is more information on how to install Spark 2 with Cloudera Manager: http://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
... View more