Member since
04-07-2017
80
Posts
33
Kudos Received
0
Solutions
12-07-2020
11:36 AM
2020 Update, what are the preferred data quality tools compatible with CDH for Hive,Hbase and Solr? Our team is looking at Apache Griffin. Regards, Nithya Koka
... View more
10-31-2017
06:37 PM
Thanks for the suggestion. I am opting for the 2nd solution as the data is not one big continuous row and 1st solution did not work.
... View more
08-28-2017
11:25 AM
Great!! I set up a spark-cluster with 2 workers. I save a Dataframe using partitionBy ("column x") as a parquet format to some path on each worker. The matter is that i am able to save it but if i want to read it back i am getting these errors: - Could not read footer for file file´status ...... - unable to specify Schema ... Any Suggestions?
... View more
04-03-2017
05:38 PM
1 Kudo
@Revathy Mourouguessane In your Hive table properties you can specify skip.footer.line.count to remove footer from your data. If you just have one line footer, set this value to 1. You will specify this in your create table properties: tblproperties("skip.header.line.count"="1", "skip.footer.line.count"="1");
... View more
06-09-2016
06:45 PM
Rajkumar, Have you tried connecting directly with the hive jdbc driver? I'm suspecting it's a jar conflict somewhere. Here's my hive driver config in IntelliJ, obviously took the shotgun approach and added all client jar but the main required are hive-common, hive-jdbc.
... View more
05-25-2016
05:55 AM
Thank you. Do you know any generic scripts developed in spark for data profiling and data cleaning, that you can share?
... View more
05-06-2016
12:54 AM
Hi Abdel, I haven't tried this one. Used Join instead. I would try. Thank you.
... View more
04-23-2016
03:41 AM
What's your max counter now, what does the error message say? You can try to increase tez.counters.max, Tez default is 2000, but in the latest version of Ambari it's set to 10000. Also, make sure you are using Pig-0.15 packaged in one of the latest versions of HDP. In Pig-0.14 tez_local mode was unstable. So, you can change tez.counters.max in Ambari or set it per Pig run: pig -D tez.counters.max=10000 -x tez_local By the way, what happens if you run your command in Tez mode, on a cluster?
... View more
04-22-2016
03:14 PM
I tried "ambari-server restart". Restart was successful, but the stacks wasn't up. I restart all the components. And Thank you. Hive is up.
... View more
06-10-2019
05:59 PM
Do you know if there is a way to specify a python virtual environment for streaming_python to use instead of it using the base python installation?
... View more