About paulhernandez

paulhernandez · ‎04-04-2018

Hi everyone, I am getting crazy trying to figure out, why I cannot read a Hive External Table, which points to a directory with parquet files. The parquet files are created with a Spark program like this: eexTable.repartition(1).write.mode("append").save(dataPath.concat(eexFileName)) I created an external table using this dll: CREATE EXTERNAL TABLE my_db.eex_actual_plant_gen_line ( meta_date timestamp, meta_directory string , meta_filename string, meta_host string, meta_numberofrecords int, meta_recordnumber int, meta_sequenceid string, country string, source string, data_timestamp timestamp, actual_generation double, publication_timestamp timestamp, modification_timestamp timestamp, created_on timestamp ) COMMENT 'External table eex transparency actual plant generation line' ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/development/data/EEX/transparency/production/usage' I am able to query this table using Ambari or CLI but When I try to use spark I can retrieve the table schema but no rows are returned: import org.apache.spark.sql.{ Row, SaveMode, SparkSession } import org.apache.spark.sql.functions._ val warehouseLocation = "/apps/hive/warehouse" val spark = SparkSession .builder() .appName("EEX_Trans") .config("spark.sql.warehouse.dir", warehouseLocation) .config("hive.metastore.uris", "thrift://myserver1:9083,thrift://myserver2:9083") .enableHiveSupport() .getOrCreate() val hadoopConf = spark.sparkContext.hadoopConfiguration hadoopConf.set("mapreduce.input.fileinputformat.input.dir.recursive", "true") spark.sql("Select * from dev_sdsp.facilities").count() I cannot find the error and I already read 1000 posts without luck. Any comment will be appreciated. Kind regards, Paul

paulhernandez · ‎02-23-2018

Hhi @smanjee hi @Sridhar Reddy I try to add the phoenix-hive.jar to hive permanently and general available for external applications but I don't know exactly how to do that with ambari. I set the hive.aux.jars.path property in cusmto hive-site in abari to a location in hdfs to centrally reach the jars but it is not working. If I create a auxlib folder in a node, then it works but only for this node. Any comment will be appreciated. Kind regards, Paul

paulhernandez · ‎02-23-2018

Hi @bthiyagarajan thanks for the article. I would like to know why this does not work, I mean, why I cannot set this property using the custom hive-site in ambari? How should I specify the value of the HIVE_AUX_JARS_PATH in the jinja template? Many thanks in advance, Kind regards, Paul

paulhernandez · ‎02-01-2018

Hi @Krishnaswami Rajagopalan I don't know exactly the detail of this Sandbox in the Azure Cloud. Are you connecting to the Sandbox or to the docker container inside? The docker container is where zeppelin and other services are located. To connect to the docker container use the port 2222 in your SSH command. Example: ssh root@127.0.0.1 -p 2222 It doesn't matter I guess if your cluster or sandbox is running on the cloud. You should be able to find zeppelin under /usr/hdp/current/zeppelin-server Hope this helps. BR. Paul

paulhernandez · ‎11-25-2017

Hi @Chiranjeevi Nimmala you can have a look at my last blog post, it may help you: Installing Apache Zeppelin 0.7.3 in HDP 2.5.3 with Spark and Spark2 Interpreters

paulhernandez · ‎11-08-2017

Hi everyone, could anyone confirm the information I found in this nice blog entry: How To Locally Install & Configure Apache Spark & Zeppelin 1) Python 3.6 will break PySpark. Use any version < 3.6 2) PySpark doesn’t play nicely w/Python 3.6; any other version will work fine. Many thanks in advance! Paul

paulhernandez · ‎02-09-2017

Hi Daniel, thanks for sharing this info. However, I am facing exact the same problem, with the same error message. I am using the HDP 2.5 sandbox and I am not able to navigate to zeppelin. I verified the permissions of the webapps folder but they are right. Do you have any suggestions? Any comment will be appreciated. Thanks in advance, Paul

paulhernandez · ‎02-01-2017

Thanks a lot! After following the @Michael Young article I was able to successfully run my Talend job.

paulhernandez · ‎01-31-2017

Hi everyone, I am trying to load files into a HDP 2.5 sandbox for VBox. I am using Talend Open Studio 6.3. My host system is a Windows 7 laptop connected to the corporate network. I tested with NAT Network for the VM and created a forwarding rule for the port 50010 (Host IP 127.0.0.1, Host Port 50010 and Guest Port 50010) I also added the 127.0.0.1 sandbox.hortonworks.com to the host file in Windows. I am getting the following error while running the Talend job: File xxx could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. Also, data node is running and I have enough space. According to the community the only thing left is the data node name could not be resolve, but I tried all of the suggestions without success. What else could be wrong? Any comment will be appreciated. Kind regards, Paul

paulhernandez · ‎06-17-2016

Hi, it isn't officially supported, but I tried and it works.

Online	Offline
Last Visited	‎09-23-2019 04:13 AM

Member Since	‎06-17-2016 03:34 PM
Last Visited	‎09-23-2019 04:13 AM
Posts	56
Kudos received	6

Cloudera Community

Hive External Table with Parquet Format produces a...

Re: Adding hive auxiliary jar files

Re: Not able to setup Hive auxillary jars setup us...

Re: Problem: Zeppelin 0.6 and Spark 2 on HDP 2.5

Re: Problem: Zeppelin 0.6 and Spark 2 on HDP 2.5

PySpark and Python version (<3.6)?

Re: Zeppelin UI returns 503 error

Re: Cannot transfer files to HDP 2.5 sandbox using...

Cannot transfer files to HDP 2.5 sandbox using Tal...

Re: Does Microsoft SQL Server 2016 Polybase Suppor...