Member since
06-17-2016
56
Posts
6
Kudos Received
0
Solutions
04-04-2018
09:55 AM
Hi everyone, I am getting crazy trying to figure out, why I cannot read a Hive External Table, which points to a directory with parquet files. The parquet files are created with a Spark program like this: eexTable.repartition(1).write.mode("append").save(dataPath.concat(eexFileName)) I created an external table using this dll: CREATE EXTERNAL TABLE my_db.eex_actual_plant_gen_line (
meta_date timestamp,
meta_directory string ,
meta_filename string,
meta_host string,
meta_numberofrecords int,
meta_recordnumber int,
meta_sequenceid string,
country string,
source string,
data_timestamp timestamp,
actual_generation double,
publication_timestamp timestamp,
modification_timestamp timestamp,
created_on timestamp
)
COMMENT 'External table eex transparency actual plant generation line'
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION '/development/data/EEX/transparency/production/usage' I am able to query this table using Ambari or CLI but When I try to use spark I can retrieve the table schema but no rows are returned: import org.apache.spark.sql.{ Row, SaveMode, SparkSession }
import org.apache.spark.sql.functions._
val warehouseLocation = "/apps/hive/warehouse"
val spark = SparkSession
.builder()
.appName("EEX_Trans")
.config("spark.sql.warehouse.dir", warehouseLocation)
.config("hive.metastore.uris", "thrift://myserver1:9083,thrift://myserver2:9083")
.enableHiveSupport()
.getOrCreate()
val hadoopConf = spark.sparkContext.hadoopConfiguration
hadoopConf.set("mapreduce.input.fileinputformat.input.dir.recursive", "true")
spark.sql("Select * from dev_sdsp.facilities").count() I cannot find the error and I already read 1000 posts without luck. Any comment will be appreciated. Kind regards, Paul
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
02-23-2018
03:23 PM
Hhi @smanjee hi @Sridhar Reddy I try to add the phoenix-hive.jar to hive permanently and general available for external applications but I don't know exactly how to do that with ambari. I set the hive.aux.jars.path property in cusmto hive-site in abari to a location in hdfs to centrally reach the jars but it is not working. If I create a auxlib folder in a node, then it works but only for this node. Any comment will be appreciated. Kind regards, Paul
... View more
02-23-2018
03:12 PM
Hi @bthiyagarajan thanks for the article. I would like to know why this does not work, I mean, why I cannot set this property using the custom hive-site in ambari? How should I specify the value of the HIVE_AUX_JARS_PATH in the jinja template? Many thanks in advance, Kind regards, Paul
... View more
02-01-2018
11:13 AM
Hi @Krishnaswami Rajagopalan I don't know exactly the detail of this Sandbox in the Azure Cloud. Are you connecting to the Sandbox or to the docker container inside? The docker container is where zeppelin and other services are located. To connect to the docker container use the port 2222 in your SSH command. Example: ssh root@127.0.0.1 -p 2222 It doesn't matter I guess if your cluster or sandbox is running on the cloud. You should be able to find zeppelin under /usr/hdp/current/zeppelin-server Hope this helps. BR. Paul
... View more
11-25-2017
07:26 AM
Hi @Chiranjeevi Nimmala you can have a look at my last blog post, it may help you: Installing Apache Zeppelin 0.7.3 in HDP 2.5.3 with Spark and Spark2 Interpreters
... View more
11-08-2017
08:43 AM
Hi everyone, could anyone confirm the information I found in this nice blog entry: How To Locally Install & Configure Apache Spark & Zeppelin 1) Python 3.6 will break PySpark. Use any version < 3.6 2) PySpark doesn’t play nicely w/Python 3.6; any other version will work fine. Many thanks in advance! Paul
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
02-09-2017
12:52 PM
Hi Daniel, thanks for sharing this info. However, I am facing exact the same problem, with the same error message. I am using the HDP 2.5 sandbox and I am not able to navigate to zeppelin. I verified the permissions of the webapps folder but they are right. Do you have any suggestions? Any comment will be appreciated. Thanks in advance, Paul
... View more
02-01-2017
08:20 AM
Thanks a lot! After following the @Michael Young article I was able to successfully run my Talend job.
... View more
01-31-2017
01:31 PM
1 Kudo
Hi everyone, I am trying to load files into a HDP 2.5
sandbox for VBox. I am using Talend Open Studio 6.3. My host system is a Windows 7 laptop
connected to the corporate network. I tested with NAT Network for the VM and created
a forwarding rule for the port 50010 (Host IP 127.0.0.1, Host Port 50010 and
Guest Port 50010) I also added the 127.0.0.1
sandbox.hortonworks.com to the host file in Windows. I am getting the following error while
running the Talend job: File xxx could only be replicated to 0
nodes instead of minReplication (=1).
There are 1 datanode(s) running and 1 node(s) are excluded in this
operation. Also, data node is running and I have enough
space. According to the community the only thing left is the data node name
could not be resolve, but I tried all of the suggestions without success. What else could be wrong? Any comment will be appreciated. Kind regards, Paul
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
06-17-2016
03:35 PM
Hi, it isn't officially supported, but I tried and it works.
... View more
- « Previous
-
- 1
- 2
- Next »