About Eukrev

KokaN · ‎12-07-2020

2020 Update, what are the preferred data quality tools compatible with CDH for Hive,Hbase and Solr? Our team is looking at Apache Griffin. Regards, Nithya Koka

Eukrev · ‎10-31-2017

Thanks for the suggestion. I am opting for the 2nd solution as the data is not one big continuous row and 1st solution did not work.

djib100 · ‎08-28-2017

Great!! I set up a spark-cluster with 2 workers. I save a Dataframe using partitionBy ("column x") as a parquet format to some path on each worker. The matter is that i am able to save it but if i want to read it back i am getting these errors: - Could not read footer for file file´status ...... - unable to specify Schema ... Any Suggestions?

mqureshi · ‎04-03-2017

@Revathy Mourouguessane In your Hive table properties you can specify skip.footer.line.count to remove footer from your data. If you just have one line footer, set this value to 1. You will specify this in your create table properties: tblproperties("skip.header.line.count"="1", "skip.footer.line.count"="1");

agauthier · ‎06-09-2016

Rajkumar, Have you tried connecting directly with the hive jdbc driver? I'm suspecting it's a jar conflict somewhere. Here's my hive driver config in IntelliJ, obviously took the shotgun approach and added all client jar but the main required are hive-common, hive-jdbc.

Eukrev · ‎05-25-2016

Thank you. Do you know any generic scripts developed in spark for data profiling and data cleaning, that you can share?

Eukrev · ‎05-06-2016

Hi Abdel, I haven't tried this one. Used Join instead. I would try. Thank you.

pminovic · ‎04-23-2016

What's your max counter now, what does the error message say? You can try to increase tez.counters.max, Tez default is 2000, but in the latest version of Ambari it's set to 10000. Also, make sure you are using Pig-0.15 packaged in one of the latest versions of HDP. In Pig-0.14 tez_local mode was unstable. So, you can change tez.counters.max in Ambari or set it per Pig run: pig -D tez.counters.max=10000 -x tez_local By the way, what happens if you run your command in Tez mode, on a cluster?

Eukrev · ‎04-22-2016

I tried "ambari-server restart". Restart was successful, but the stacks wasn't up. I restart all the components. And Thank you. Hive is up.

betocolsf · ‎06-10-2019

Do you know if there is a way to specify a python virtual environment for streaming_python to use instead of it using the base python installation?

Online	Offline
Last Visited	‎04-07-2017 03:27 AM

Member Since	‎04-07-2017 02:31 AM
Last Visited	‎04-07-2017 03:27 AM
Posts	80
Kudos received	33

Cloudera Community

Re: Recommended data quality test suite for Hive /...

Re: MapReduce: FixedRecordReader - Partial record ...

Re: Write / Read Parquet File in Spark

Re: Apache hive: to ignore the header and footer

Re: while running a hive jdbc client from Intellij...

Re: Data quality analysis

Re: pig - Filter output of cogroup having NULL

Re: Pig -x tez_local: counters.LimitExceedsExcepti...

Re: Hive error: H110 Unable to submit statement. I...

Re: Pig: Streaming through python