About joncodin

joncodin · ‎05-08-2016

Im trying to create a table in hive with orc format and load this table with data that I have in a ".tbl" file. In the ".tbl" files each row have this format: 1|Customer#000000001|IVhzIApeRb ot,c,E|15|711.56|BUILDING|to the even, regular platelets. regular, ironic epitaphs nag e| I create a hive table with orc format like this: create table if not exists partsupp (PS_PARTKEY BIGINT, PS_SUPPKEY BIGINT, PS_AVAILQTY INT, PS_SUPPLYCOST DOUBLE, PS_COMMENT STRING)STORED AS ORC TBLPROPERTIES ("orc.compress"="SNAPPY") Now Im trying to load data into the table like this: LOAD DATA LOCAL INPATH '/tables/partsupp/partsupp.tbl' [OVERWRITE] INTO TABLE partsupp; My questions are, do you know if this is a correct method to do this? And if it is, do you know why this error is happening when I do the load data inpatch command? Failed: Parse exception mismatched input '[' expecting into near '/tables/partsupp/partsupp.tbl in load statement

joncodin · ‎05-06-2016

Thanks for your help. And do you know if the diagram of the jobs executed after we execute a query, the DAG visualization is about what? That visualization shows the physical or logical plan?

joncodin · ‎05-04-2016

Thanks for your answer, now I can see the plans. And the diagram that appears in the spark user interface about each job, the DAG Visualization what is? Is the logical or physical plan? Or its another thing? And the diagram that you refer in your first phrase is which?

joncodin · ‎05-04-2016

Thanks for your answer. But Spark SQL uses that catalyst component always? It is part of Spark SQL? Everytime we execute a query it uses that component? And do you know to show the logical and physical plains of the queries?

joncodin · ‎05-04-2016

Hi, Im executing tpc queries over hive tables using Spark SQL as below: var hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) var query = hiveContext.sql(" SELECT ..."); query.show I learn the process about configure and use Spark SQL until this point. But now I would like to learn about how Spark SQL works internally to execute this queries over hive tables, things like execution plans, logical and physical plan, optimization. To understand better how Spark SQL or what Spark SQL uses to decide which is the best execution plan. Im trying to find information about this but nothing in concrete, someone can give a overview about this so I can understand the basics to try then find more concrete information, or do you know some articles or something that explain this? And also do you know where or what is the command to see the logical and physical plans that Spark SQL uses when exute the queries?

joncodin · ‎04-29-2016

Yes Im trying to install manually, because I think its better to learn the process how to get hadoop running.

joncodin · ‎04-29-2016

Thanks, Where I cant find that logs? Im trying to find but without success. And when I execute the echo $HOSTANAME command the fully hostname that I get is what I put in the question.

joncodin · ‎04-28-2016

Im installing hadoop 2.7.1 on 3 nodes and Im having some difficulties in the configuration process. I want to have: node1 (master) - as the namenode and resource manager node2 (slave) - as the datanode and nodemanager node3(slave) - as the datanode and nodemanager Im doing the configurations like below to achieve the goal: etc/hosts file: 127.0.0.1 localhost 192.168.1.60 NameNode 192.168.1.61 Slave1 192.168.1.62 Slave2 core-site.xml: <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://NameNode:9000</value> </property> </configuration> hdfs-site.xml: <configuration> <name>dfs.replication</name> <value>3</value> In the slaves files i enter the hostnames of the slaves machines: Slave1 Slave2 I created a masters file and entered the hostname of the master machine: NameNode Note: I didnt configure the yarn-site.xml and mapred-site.xml files. Its needed? Problem: With my configuration above Im having two issues when start all deamons and check with jps command: 1) the node manager appears in the master and not only in the slaves machines 2) the datanode dont appear in the slaves machines jps in the master machine: ResourceManager NameNode NodeManager SecondaryNameNode jps command in slave machines: NodeManager

joncodin · ‎04-04-2016

Thank you really. Now it is working! It is just showing some warnings about "version information not found in metastore..." and "failed to get database default returning NoSuchObjectException". But as they are warnings should be working fine, right?

joncodin · ‎04-03-2016

Hi, Im trying to execute queries with Spark SQL over hive tables stored in hdfs single node, but Im with some problems to start spark correctly. I already have hadoop and hive installed and already created the tables with hive with the data stored in hdfs. I will say what is my hadoop and hive configuration, and hope that someone there already try to execute queries with spark over hive tables and can give a help, and can say what are the step to install spark correctly for this purpose. I installed hadoop-2.7.1, I extract the files add the environment variables and configured core-site.xml and hdfs-site.xml. core-site.xml: <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> hdfs-site.xml: <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> Then format the namenode with: hadoop namenode -format Then I start hadoop with: ./start-yarn.sh ./start-dfs.sh And it seems that everything works: [hadoopdadmin@hadoop sbin]$ jps 9601 NameNode 9699 DataNode 10003 Jps 9091 ResourceManager 9894 SecondaryNameNode 9191 NodeManager Then after hadoop installed I download hive 1.2.1 and just extract the files and add the environment variables. The .bashrc file is like this now: export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64 export HADOOP_HOME=/usr/local/hadoop-2.7.1 export HIVE_HOME=/usr/local/apache-hive-1.2.1-bin export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin To start Hive I just write hive and it seems that works: [hadoopadmin@hadoopSingleNode ~]$ hive Logging initialized using configuration in jar:file:/usr/local/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties hive> I have some tables in hive that I create with this command: create table customer (C_CUSTKEY INT, C_NAME STRING, C_ADDRESS STRING, C_NATIONKEY INT, C_PHONE STRING, C_ACCTBAL DOUBLE, C_MKTSEGMENT STRING, C_COMMENT STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION '/tables/customer'; Now its time to install spark to query this hive tabes. What Im doing is just download this version "http://www.apache.org/dyn/closer.lua/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz", extract the files and configure environment variables. After this with spark-shell Im getting a lot of errors. I already try a lot of things but nothing is working to fix the issues, so someone can see what is not ok in my configurations step or what is missing here? Errors that are appearing after execute spark-shell command:

Online	Offline
Last Visited	‎06-22-2016 06:53 PM

Member Since	‎03-20-2016 04:13 PM
Last Visited	‎06-22-2016 06:53 PM
Posts	56
Kudos received	18

Cloudera Community

create hive orc table

Re: Spark SQL Internally

Re: Spark SQL Internally

Re: Spark SQL Internally

Spark SQL Internally

Re: hadoop 3 nodes configuration issues: Datanode ...

Re: hadoop 3 nodes configuration issues: Datanode ...

hadoop 3 nodes configuration issues: Datanode dont...

Re: Help to start spark with no errors

Help to start spark with no errors