About mgaido

mgaido · ‎06-27-2016

Maybe it's not the only issue, but you have to specify an alias for the subquery. Try doing this and let us know if you have other issues or the same error still remains...

mgaido · ‎06-27-2016

You can simply use spark-submit, which is in the bin folder of your spark-client installation. Here you can find the documentation for it: http://spark.apache.org/docs/latest/submitting-applications.html

mgaido · ‎06-03-2016

You can simply use the syntax create or replace your_view ...

mgaido · ‎06-01-2016

If you do just rm you're actually moving your data to the Trash. In order to remove the data from HDFS and free space, when you do the rm you have to put the flag -skipTrash. In order to delete the data from the trash, you can run: hdfs dfs -expunge

mgaido · ‎05-11-2016

With Hive what you can do is reading those fields as string and validate them through regexp. Otherwise, if you are sure that you don't have NULL values, you can simply define your schema and if a int field is NULL, this means that it is not properly formatted.

mgaido · ‎05-05-2016

In a HDP distribution it should be located in /usr/hdp/current/hadoop-mapreduce-client/

mgaido · ‎04-29-2016

That INFO only states that there were no TEZ session available and then a new one has to be created. But this is not an issue: the only problem is that you have a bit more time to start because it has to allocate the resources. The problem is more related to the Hive View which is not working properly showing you the results. Running the same query via beeline on command line you will see all the results.

mgaido · ‎04-22-2016

I think that the best option for compiling scala Spark code is to use sbt,which is a tool for managing dependencies. You can do the same with Maven anyway, as you prefer.

mgaido · ‎04-21-2016

You can simply use sqoop

mgaido · ‎04-20-2016

It turned out that the problem was caused by a join with a subquery which made the data to be unevenly distributed among the partitions. Actually I don't know why this is happening but we solved by materializing the subquery. Thank you or the support.

Online	Offline
Last Visited	‎03-08-2017 05:54 AM

Member Since	‎01-09-2017 02:57 AM
Last Visited	‎03-08-2017 05:54 AM
Posts	55
Kudos received	14

Cloudera Community

Re: 10-fold cross validation in Random Forests

Re: In Java Convert Mahout Vector to Spark Vector

Re: HDP 2.5 TP issue running Hive query - xasecure...

Re: Change views hive in hue

Re: Can't find example mapreduce jar file

Re: FAILED: ParseException line 1991:0 Failed to r...

Re: run a python script containing commands spark

Re: Change views hive in hue

Re: How Clean HDFS in ambari?

Re: Data Type Validation in Hive

Re: Can't find example mapreduce jar file

Re: INFO : Tez session hasn't been created yet. Op...

Re: How to create jar file from spark scala file?

Re: hive import from sql server?

Re: SparkSQL very slow writing ro ORC table (and i...