Member since
03-20-2016
56
Posts
18
Kudos Received
0
Solutions
05-08-2016
01:28 PM
Im trying to create a table in hive with orc format and load this table with data that I have in a ".tbl" file. In the ".tbl" files each row have this format:
1|Customer#000000001|IVhzIApeRb ot,c,E|15|711.56|BUILDING|to the even, regular platelets. regular, ironic epitaphs nag e|
I create a hive table with orc format like this:
create table if not exists partsupp (PS_PARTKEY BIGINT, PS_SUPPKEY BIGINT, PS_AVAILQTY INT, PS_SUPPLYCOST DOUBLE, PS_COMMENT STRING)STORED AS ORC TBLPROPERTIES ("orc.compress"="SNAPPY")
Now Im trying to load data into the table like this:
LOAD DATA LOCAL INPATH '/tables/partsupp/partsupp.tbl' [OVERWRITE] INTO TABLE partsupp;
My questions are, do you know if this is a correct method to do this? And if it is, do you know why this error is happening when I do the load data inpatch command? Failed: Parse exception mismatched input '[' expecting into near '/tables/partsupp/partsupp.tbl in load statement
... View more
Labels:
- Labels:
-
Apache Hive
05-06-2016
07:06 PM
Thanks for your help. And do you know if the diagram of the jobs executed after we execute a query, the DAG visualization is about what? That visualization shows the physical or logical plan?
... View more
05-04-2016
10:28 PM
Thanks for your answer, now I can see the plans. And the diagram that appears in the spark user interface about each job, the DAG Visualization what is? Is the logical or physical plan? Or its another thing? And the diagram that you refer in your first phrase is which?
... View more
05-04-2016
07:47 PM
Thanks for your answer. But Spark SQL uses that catalyst component always? It is part of Spark SQL? Everytime we execute a query it uses that component? And do you know to show the logical and physical plains of the queries?
... View more
05-04-2016
05:04 PM
Hi,
Im executing tpc queries over hive tables using Spark SQL as below: var hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
var query = hiveContext.sql(" SELECT ...");
query.show I learn the process about configure and use Spark SQL until this point. But now I would like to learn about how Spark SQL works internally to execute this queries over hive tables, things like execution plans, logical and physical plan, optimization. To understand better how Spark SQL or what Spark SQL uses to decide which is the best execution plan.
Im trying to find information about this but nothing in concrete, someone can give a overview about this so I can understand the basics to try then find more concrete information, or do you know some articles or something that explain this? And also do you know where or what is the command to see the logical and physical plans that Spark SQL uses when exute the queries?
... View more
Labels:
- Labels:
-
Apache Spark
04-29-2016
11:12 AM
Yes Im trying to install manually, because I think its better to learn the process how to get hadoop running.
... View more
04-29-2016
11:11 AM
Thanks, Where I cant find that logs? Im trying to find but without success. And when I execute the echo $HOSTANAME command the fully hostname that I get is what I put in the question.
... View more
04-28-2016
11:10 PM
1 Kudo
Im installing hadoop 2.7.1 on 3 nodes and Im having some difficulties in the configuration process. I want to have: node1 (master) - as the namenode and resource manager node2 (slave) - as the datanode and nodemanager node3(slave) - as the datanode and nodemanager Im doing the configurations like below to achieve the goal: etc/hosts file: 127.0.0.1 localhost
192.168.1.60 NameNode
192.168.1.61 Slave1
192.168.1.62 Slave2
core-site.xml: <configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://NameNode:9000</value>
</property>
</configuration>
hdfs-site.xml: <configuration>
<name>dfs.replication</name>
<value>3</value>
In the slaves files i enter the hostnames of the slaves machines: Slave1
Slave2
I created a masters file and entered the hostname of the master machine: NameNode
Note: I didnt configure the yarn-site.xml and mapred-site.xml files. Its needed? Problem: With my configuration above Im having two issues when start all deamons and check with jps command: 1) the node manager appears in the master and not only in the slaves machines 2) the datanode dont appear in the slaves machines jps in the master machine: ResourceManager
NameNode
NodeManager
SecondaryNameNode
jps command in slave machines: NodeManager
... View more
Labels:
- Labels:
-
Apache Hadoop
04-04-2016
01:35 PM
Thank you really. Now it is working! It is just showing some warnings about "version information not found in metastore..." and "failed to get database default returning NoSuchObjectException". But as they are warnings should be working fine, right?
... View more
04-03-2016
02:02 AM
3 Kudos
Hi, Im trying to execute queries with Spark SQL over hive tables stored in hdfs single node, but Im with some problems to start spark correctly. I already have hadoop and hive installed and already created the tables with hive with the data stored in hdfs. I will say what is my hadoop and hive configuration, and hope that someone there already try to execute queries with spark over hive tables and can give a help, and can say what are the step to install spark correctly for this purpose. I installed hadoop-2.7.1, I extract the files add the environment variables and configured core-site.xml and hdfs-site.xml. core-site.xml: <configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration> hdfs-site.xml: <configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property> Then format the namenode with: hadoop namenode -format Then I start hadoop with: ./start-yarn.sh
./start-dfs.sh And it seems that everything works: [hadoopdadmin@hadoop sbin]$ jps
9601 NameNode
9699 DataNode
10003 Jps
9091 ResourceManager
9894 SecondaryNameNode
9191 NodeManager Then after hadoop installed I download hive 1.2.1 and just extract the files and add the environment variables. The .bashrc file is like this now: export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64
export HADOOP_HOME=/usr/local/hadoop-2.7.1
export HIVE_HOME=/usr/local/apache-hive-1.2.1-bin
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin To start Hive I just write hive and it seems that works: [hadoopadmin@hadoopSingleNode ~]$ hive
Logging initialized using configuration in jar:file:/usr/local/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> I have some tables in hive that I create with this command: create table customer (C_CUSTKEY INT, C_NAME STRING, C_ADDRESS
STRING, C_NATIONKEY INT, C_PHONE STRING, C_ACCTBAL DOUBLE, C_MKTSEGMENT
STRING, C_COMMENT STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE LOCATION '/tables/customer'; Now its time to install spark to query this hive tabes. What Im doing is just download this version "http://www.apache.org/dyn/closer.lua/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz", extract the files and configure environment variables. After this with spark-shell Im getting a lot of errors. I already try a lot of things but nothing is working to fix the issues, so someone can see what is not ok in my configurations step or what is missing here? Errors that are appearing after execute spark-shell command:
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
- « Previous
- Next »