About EricL

EricL · ‎10-14-2019

@Plop564 I am not an expert in Spark, but my understand is below: 1. I will have 100 output files >>> this depends how many partitions you have in your original DF. "coalesce" can only reduce number of partitions, so if you have less than 100 partitions before, then it won't do anything, as "coalesce" does not do shuffling. If you want to guarantee number of output files, I believe "repartition" function is better. 2. Each single CSV file is locally sorted, I mean by the "date" column ascending >>> Yes 3. Files are globally sorted, I mean CSV part-0000 have "date" inferior to CSV part-0001, CSV part-0001 have "date" inferior to CSV part-0002 and so on .. >>> I believe it is also Yes, but will wait for other Spark experts to confirm. Cheers Eric

EricL · ‎10-10-2019

@sbn, /etc/spark2/conf should be a symlink to /etc/spark2/conf.cloudera.spark2_on_yarn, can you confirm by running: ls -al /etc/spark2 Cheers Eric

EricL · ‎10-10-2019

Good luck @pramana!

EricL · ‎10-06-2019

@Mekaam, Glad that it helped. Cheers Eric

EricL · ‎10-06-2019

@pramana , Looks like that you are using Ubuntu "bionic", which is not supported in CDH/CM 5.16.x, Bionic is only supported from CDH 6.2 onwards. https://docs.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html#c516_supported_os https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_os_requirements.html#c63_supported_os So you need to either try 6.2 version of CM/CDH, or change the version of your Ubuntu OS. Hope that helps. Cheers Eric

EricL · ‎10-06-2019

@gimp077 , Did you mean that "REFRESH" takes time, and eventually you can see the update data, but just some delay? How big is the table? I mean in terms of number of partitions and number of files in HDFS? Eric

EricL · ‎10-06-2019

@priyanka1_munja, Are you complaining that same partition appears multiple times? Did you notice the extra space before some of the partition keys? For example, "03-04-2015" vs " 03-04-2015"? I think that's the reason for the duplicates. Cheers Eric

EricL · ‎10-04-2019

Hmm, you missed the database name in the connection string, try below: beeline -u 'jdbc:hive2://slave1:10000/default;ssl=true;sslTrustStore=/var/run/cloudera-scm-agent/process/72-hive-HIVESERVER2/cm-auto-host_keystore.jks;trustStorePassword=yeap4IhJzRvK5gBGVMeTahoL21BNmBF2TSi46pbQTP6' Cheers Eric

EricL · ‎10-04-2019

@Mekaam, Can you please add quotes around the JDBC connection string? So like below: beeline -u 'jdbc:hive2://slave1:10000;ssl=true;sslTrustStore=/var/run/cloudera-scm-agent/process/72-hive-HIVESERVER2/cm-auto-host_keystore.jks;trustStorePassword=yeap4IhJzRvK5gBGVMeTahoL21BNmBF2TSi46pbQTP6' I believe without quotes it will cause issues. If still not working, check HS2 log to see what it complains on the server side. Cheers Eric

EricL · ‎09-30-2019

@parthk , There is no current date locked in for the new impala release that will support Ranger at the moment. However, I would like to ask why you do not want to have kerberos? Authorization does not work properly without Authentication in the front. Think about an online application, you surely want users to be able to login first, before you can say what level of access they should have. Same applies in CDH world. Kerberos acts as the front end login, and Sentry/Ranger acts as the backend authorization control. So without Kerberos, you are allowing everyone to be able to access CDH. I strongly suggest you to implement Kerberos first before Sentry, Ranger is the same story regardless. Cheers Eric

Online	Offline
Last Visited	‎08-12-2020 03:17 AM

Member Since	‎03-23-2015 01:24 PM
Last Visited	‎08-12-2020 03:17 AM
Posts	1,288
Kudos received	113

Cloudera Community

Re: max() function generating an error in sqoop

Re: Add a dynamic variable to a Hive view

Re: Hive Server 2 failing to start CDP ,Cloudera M...

Re: Sqoop export from hive to teradata - > issue ...

Re: Cloudera Hadoop internal workings

Re: Spark : write ordered Dataframe to CSV

Re: Set SPARK_HOME for spark2

Re: Stuck on Cloud manager parcel instalation

Re: Beeline Fails to connect to Hive with Auto-TLS...

Re: Stuck on Cloud manager parcel instalation

Re: After Impala Refresh Metadata is still stale

Re: unable to create unique partitions in hive

Re: Beeline Fails to connect to Hive with Auto-TLS...

Re: Beeline Fails to connect to Hive with Auto-TLS...

Re: When is new CDH bundle with impala version 3.3...