Member since
05-28-2015
47
Posts
28
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5904 | 06-20-2016 04:00 PM | |
10616 | 01-16-2016 03:15 PM | |
11210 | 01-16-2016 05:06 AM | |
5182 | 01-14-2016 06:45 PM | |
2918 | 01-14-2016 01:56 AM |
01-21-2016
08:59 PM
Try specifyiong like <arg>outputpath=hdfs://localhost/input/file.txt</arg> as below <master>local[*]</master>
<mode>client<mode>
<name>Spark Example</name>
<class>org.apache.spark.examples.mllib.JavaALS</class>
<jar>/lib/spark-examples_2.10-1.1.0.jar</jar>
<spark-opts>--executor-memory 20G --num-executors 50</spark-opts>
<arg>inputpath=hdfs://localhost/input/file.txt</arg>
<arg>outputpath=hdfs://localhost/output/file.txt</arg>
<arg>value=2</arg>
... View more
01-16-2016
03:15 PM
Try below- 1. stop the HDFS $HADOOP_HOME/sbin/stop-dfs.sh 2.Remove the temp folder. Check for the log to get the name dir 3. Set the name node and data node directories in hdfs-site.xml in your prefered location property>
<name>dfs.namenode.name.dir</name>
<value>file:/users/gangadharkadam/hadoopdata/hdfs/namenode</value> </property>
<property>
<name>dfs.datanode.name.dir</name>
<value>file:/users/gangadharkadam/hadoopdata/hdfs/datanode</value> 4.Set the permissions on the new directories sudo chown gangadharkadam:staff /users/gangadharkadam/hadoopdata/hdfs/namenode sudo chmod 750 /users/gangadharkadam/hadoopdata/hdfs/namenode 5. Format the namnode hdfs dfs namenode -format 6. Start the HDFS again $HADOOP_HOME/sbin/start-dfs.sh 7. check the running daemons using jps -l Have good luck with your new HDFS 🙂
... View more
01-16-2016
05:06 AM
2 Kudos
Hive is trying to provide a value for the new column for those records where it did not exist, you need to specify a default for the new columns using 'avro.schema.literal table property. In the below example original tables has just one column. Age is added in the second version of schema. If the file that it’s reading is of a different schema, it will attempt to convert it using Avro schema resolution. The entire definition is in the avro.schema.literal property. ALTER TABLE test_avro SET TBLPROPERTIES ( 'avro.schema.literal'='{"name":"test_record", "type":"record", "fields": [ {"name":"full_name", "type":"string"}, {"name":"age", "type":"int", "default":999}]}'); Hope this helps.
... View more
01-14-2016
06:45 PM
4 Kudos
--Register the jars REGISTER lib/parquet-pig-1.3.1.jar; REGISTER lib/parquet-column-1.3.1.jar; REGISTER lib/parquet-common-1.3.1.jar; REGISTER lib/parquet-format-2.0.0.jar; REGISTER lib/parquet-hadoop-1.3.1.jar; REGISTER lib/parquet-pig-1.3.1.jar; REGISTER lib/parquet-encoding-1.3.1.jar;
--store in parquet format SET parquet.compression gzip or SNAPPY; STORE table INTO '/path/to/table' USING parquet.pig.ParquetStorer; -- options you might want to fiddle with SET parquet.page.size 1048576 -- default. this is your min read/write unit. SET parquet.block.size 134217728 -- default. your memory budget for buffering data SET parquet.compression lzo -- or you can use none, gzip, snappy STORE mydata into '/some/path' USING parquet.pig.ParquetStorer;
--Reading
mydata = LOAD '/some/path' USING parquet.pig.ParquetLoader AS (x: int, y int);
... View more
01-14-2016
05:50 PM
Check this MoveTask implemetation https://github.com/apache/hive/blob/82fd1bdbe70acbbdf9c9fc5b227f111005f9d87a/ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
... View more
01-14-2016
05:23 PM
What MoveTask does is moving some files in /tmp volume to /user volume. When the user running doesn't have the right permissions it does not allow moving files between volumes and throws this exception. Possible workarounds- -Check /user and /tmp have the full permission -Check if the below guys are set to true hive.metastore.client.setugi=true and
hive.metastore.server.setugi=true.
These parameters instruct hive execute jobs under current shell user if not then try executing it as root.
... View more
01-14-2016
01:56 AM
1 Kudo
This message will pop up any time an application is requesting more resources from the cluster than the cluster can currently provide. What resources you might ask? Well Spark is only looking for two things: Cores and Ram. Cores represents the number of open executor slots that your cluster provides for execution. Ram refers to the amount of free Ram required on any worker running your application. Note for both of these resources the maximum value is not your System’s max, it is the max as set by the your Spark configuration. 1. Check out the current state of your cluster (and it’s free resources) at SparkMasterIP:7080 2.Make sure you have not started Spark Shell in 2 different terminals.The first Spark shell might consume all the available cores in the system leaving the second shell waiting for resources. Until the first spark shell is terminated and its resources are released, all other apps will display the above warning. The short term solution to this problem is to make sure you aren’t requesting more resources from your cluster than exist or to shut down any apps that are unnecessarily using resources. If you need to run multiple Spark apps simultaneously then you’ll need to adjust the amount of cores being used by each app.
... View more
01-11-2016
11:54 AM
3 Kudos
A temporary table is a convenient way for an application to automatically manage intermediate data generated during a complex query. Rather than manually deleting tables needed only as temporary data in a complex query, Hive automatically deletes all temporary tables at the end of the Hive session in which they are created. The data in these tables is stored in the user's scratch directory rather than in the Hive warehouse directory. The scratch directory effectively acts as the user' data sandbox, located by default in /tmp/hive-<username>. Hive users create temporary tables using the TEMPORARY keyword CREATE TEMPORARY TABLE tmp1(c1 string);
CREATE TEMPORARY TABLE tmp2 AS..
CREATE TEMPORARY TABLE tmp3 LIKE.. Multiple Hive users can create multiple Hive temporary tables with the same name because each table resides in a separate session. Temporary tables support most table options, but not all. The following features are not supported: Partition columns Indexes A temporary table with the same name as a permanent table will cause all references to that table name to resolve to the temporary table. The user cannot access the permanent table during that session without dropping or renaming the temporary table.
... View more
Labels:
01-07-2016
07:21 PM
3 Kudos
is your hive Server is running in HTTP mode? Connection URL When HiveServer2 Is Running in HTTP Mode:- jdbc:hive2://<host>:<port>/<db>;transportMode=http;httpPath=<http_endpoint> where:- <http_endpoint> is the corresponding HTTP endpoint configured in hive-site.xml. Default value is cliservice. Default port for HTTP transport mode is 10001.
... View more
01-03-2016
12:14 AM
1 Kudo
HADOOP_HOME:-/usr/hdp/current/hadoop-client HIVE_CONF_DIR=/usr/hdp/current/hive-client/conf HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar
... View more