About GeeKay2015

GeeKay2015 · ‎01-21-2016

Try specifyiong like <arg>outputpath=hdfs://localhost/input/file.txt</arg> as below <master>local[*]</master> <mode>client<mode> <name>Spark Example</name> <class>org.apache.spark.examples.mllib.JavaALS</class> <jar>/lib/spark-examples_2.10-1.1.0.jar</jar> <spark-opts>--executor-memory 20G --num-executors 50</spark-opts> <arg>inputpath=hdfs://localhost/input/file.txt</arg> <arg>outputpath=hdfs://localhost/output/file.txt</arg> <arg>value=2</arg>

GeeKay2015 · ‎01-16-2016

Try below- 1. stop the HDFS $HADOOP_HOME/sbin/stop-dfs.sh 2.Remove the temp folder. Check for the log to get the name dir 3. Set the name node and data node directories in hdfs-site.xml in your prefered location property> <name>dfs.namenode.name.dir</name> <value>file:/users/gangadharkadam/hadoopdata/hdfs/namenode</value> </property> <property> <name>dfs.datanode.name.dir</name> <value>file:/users/gangadharkadam/hadoopdata/hdfs/datanode</value> 4.Set the permissions on the new directories sudo chown gangadharkadam:staff /users/gangadharkadam/hadoopdata/hdfs/namenode sudo chmod 750 /users/gangadharkadam/hadoopdata/hdfs/namenode 5. Format the namnode hdfs dfs namenode -format 6. Start the HDFS again $HADOOP_HOME/sbin/start-dfs.sh 7. check the running daemons using jps -l Have good luck with your new HDFS 🙂

GeeKay2015 · ‎01-16-2016

Hive is trying to provide a value for the new column for those records where it did not exist, you need to specify a default for the new columns using 'avro.schema.literal table property. In the below example original tables has just one column. Age is added in the second version of schema. If the file that it’s reading is of a different schema, it will attempt to convert it using Avro schema resolution. The entire definition is in the avro.schema.literal property. ALTER TABLE test_avro SET TBLPROPERTIES ( 'avro.schema.literal'='{"name":"test_record", "type":"record", "fields": [ {"name":"full_name", "type":"string"}, {"name":"age", "type":"int", "default":999}]}'); Hope this helps.

GeeKay2015 · ‎01-14-2016

--Register the jars REGISTER lib/parquet-pig-1.3.1.jar; REGISTER lib/parquet-column-1.3.1.jar; REGISTER lib/parquet-common-1.3.1.jar; REGISTER lib/parquet-format-2.0.0.jar; REGISTER lib/parquet-hadoop-1.3.1.jar; REGISTER lib/parquet-pig-1.3.1.jar; REGISTER lib/parquet-encoding-1.3.1.jar; --store in parquet format SET parquet.compression gzip or SNAPPY; STORE table INTO '/path/to/table' USING parquet.pig.ParquetStorer; -- options you might want to fiddle with SET parquet.page.size 1048576 -- default. this is your min read/write unit. SET parquet.block.size 134217728 -- default. your memory budget for buffering data SET parquet.compression lzo -- or you can use none, gzip, snappy STORE mydata into '/some/path' USING parquet.pig.ParquetStorer; --Reading mydata = LOAD '/some/path' USING parquet.pig.ParquetLoader AS (x: int, y int);

GeeKay2015 · ‎01-14-2016

Check this MoveTask implemetation https://github.com/apache/hive/blob/82fd1bdbe70acbbdf9c9fc5b227f111005f9d87a/ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java

GeeKay2015 · ‎01-14-2016

What MoveTask does is moving some files in /tmp volume to /user volume. When the user running doesn't have the right permissions it does not allow moving files between volumes and throws this exception. Possible workarounds- -Check /user and /tmp have the full permission -Check if the below guys are set to true hive.metastore.client.setugi=true and hive.metastore.server.setugi=true. These parameters instruct hive execute jobs under current shell user if not then try executing it as root.

GeeKay2015 · ‎01-14-2016

This message will pop up any time an application is requesting more resources from the cluster than the cluster can currently provide. What resources you might ask? Well Spark is only looking for two things: Cores and Ram. Cores represents the number of open executor slots that your cluster provides for execution. Ram refers to the amount of free Ram required on any worker running your application. Note for both of these resources the maximum value is not your System’s max, it is the max as set by the your Spark configuration. 1. Check out the current state of your cluster (and it’s free resources) at SparkMasterIP:7080 2.Make sure you have not started Spark Shell in 2 different terminals.The first Spark shell might consume all the available cores in the system leaving the second shell waiting for resources. Until the first spark shell is terminated and its resources are released, all other apps will display the above warning. The short term solution to this problem is to make sure you aren’t requesting more resources from your cluster than exist or to shut down any apps that are unnecessarily using resources. If you need to run multiple Spark apps simultaneously then you’ll need to adjust the amount of cores being used by each app.

GeeKay2015 · ‎01-11-2016

A temporary table is a convenient way for an application to automatically manage intermediate data generated during a complex query. Rather than manually deleting tables needed only as temporary data in a complex query, Hive automatically deletes all temporary tables at the end of the Hive session in which they are created. The data in these tables is stored in the user's scratch directory rather than in the Hive warehouse directory. The scratch directory effectively acts as the user' data sandbox, located by default in /tmp/hive-<username>. Hive users create temporary tables using the TEMPORARY keyword CREATE TEMPORARY TABLE tmp1(c1 string); CREATE TEMPORARY TABLE tmp2 AS.. CREATE TEMPORARY TABLE tmp3 LIKE.. Multiple Hive users can create multiple Hive temporary tables with the same name because each table resides in a separate session. Temporary tables support most table options, but not all. The following features are not supported: Partition columns Indexes A temporary table with the same name as a permanent table will cause all references to that table name to resolve to the temporary table. The user cannot access the permanent table during that session without dropping or renaming the temporary table.

GeeKay2015 · ‎01-07-2016

is your hive Server is running in HTTP mode? Connection URL When HiveServer2 Is Running in HTTP Mode:- jdbc:hive2://<host>:<port>/<db>;transportMode=http;httpPath=<http_endpoint> where:- <http_endpoint> is the corresponding HTTP endpoint configured in hive-site.xml. Default value is cliservice. Default port for HTTP transport mode is 10001.

GeeKay2015 · ‎01-03-2016

HADOOP_HOME:-/usr/hdp/current/hadoop-client HIVE_CONF_DIR=/usr/hdp/current/hive-client/conf HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar

Online	Offline
Last Visited	‎02-02-2016 11:58 PM

Member Since	‎05-28-2015 02:28 PM
Last Visited	‎02-02-2016 11:58 PM
Posts	47
Kudos received	28

Cloudera Community

Re: When to go with ETL on Hive using Tez VS When...

Re: Hadoop Services are not starting up after succ...

Re: Can Hive avro tables support changing schemas?

Re: Pig ParquetStorer is not working

Re: Spark-sql command line in cluster mode on Sand...

Re: oozie SparkAction a simple job that extract-tr...

Re: Hadoop Services are not starting up after succ...

Re: Can Hive avro tables support changing schemas?

Re: Pig ParquetStorer is not working

Re: Insert overwrite query failing with Execution ...

Re: Insert overwrite query failing with Execution ...

Re: Spark-sql command line in cluster mode on Sand...

Hive Temporary Tables.

Re: Hive JDBC : Could not open client transport wi...

Re: HCatLoader() Error in Load Pig Statement