About arald

arald · ‎12-26-2018

Hi @Armanur Rahman, from the error message it seems to be a simple syntax error. From there I think this is the statement with the error: CREATE TABLE IF NOT EXISTS `hr.f_company_xml_tab` ( `RECID` STRING, `XMLTYPE.GETSTRINGVAL(XMLRECORD)` STRING) your second column is tried to be named 'XMLTYPE.GETSTRINGVAL(XMLRECORD)' which includes a '(' just as the error message claims. Can you rename the column to an easier name i.e. 'val', and try again? Regards Harald

arald · ‎12-24-2018

Hi @A Sabatino, thanks for the info. Would be great if you click on 'accept' for the answer. Helps everyone to see the issue is resolved and provides you and me with a reward in terms of reputation points 🙂 Regards Harald

arald · ‎12-24-2018

Hi @hr pyo This really depends and you will have to understand authentication with SSL to get all the details. I am trying this in short here: If you use self signed certificates or you sign the certificates by your own CA, you will experience browser warnings about unsecure connections. This means each time the user has to confirm he want to continue, until you install either the certificate of the server or the CA into the browser. Anyway there are preinstalled 'root ca' in every browser. So if you get your certificate signed by one of those root cas you don't have to install the certificate itself. Due to the chain of trust the browser accepts the signed certificate without further steps needed. To get a free of charge signed certificate you can use 'Let's encrypt'. In a enterprise level, you usually have an enterprise ca, that gets installed on all enterprise machines, and you let your certificate get signed by your enterprise ca. Regards Harald

arald · ‎12-24-2018

Hi @A Sabatino, I am not sure why you expect the date resulting from your epoch value. So from what I can see, your value is not what you expect, the conversion is fine. In the API documentation (https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#dates) it is desribed, that the value ist interpreted in milliseconds from 1. January 1970 00:00:00 GMT. Now when interpreting '1 545 266 262', it results in something like 17.8 days. So a time on the 18. January 1970 seems to be the correct result. To me it appears as if you lost a factor of 1000 somewhere in your epoch value. Regards Harald

arald · ‎12-23-2018

not very sure, but can you try the hdfs command instead? it should be configured to include the necessary jars for the execution: hdfs dfs -copyFromLocal trial.txt hdfs://sandbox-hdp.hortonworks.com:8020/tmp/

arald · ‎12-23-2018

To upload the file from your Windows machine to a Linux machine, you can use a tool like WinSCP. You configure the session for the Linux machine almost identical to the config in Putty. It gives you a GUI to copy files. On the other hand, when you need to access the Windows machine from Linux, you need to configure an FTP or better SFTP server on Windows that allows access to your NTFS path. Or you use the Windows Network to share, and install Samba, a Windows networking implementation, on the Linux machine.

arald · ‎12-22-2018

I am a little guessing here, but I believe its possible that the Hive metastore has statistics (i.e. information on the number of records in the partitions), so that the count might actually not read the complete table. The count on the file must read the file in any case. But still i think 12 min are really long for processing 3.8 GB, even if this is the compressed size. Is the count the very first action on the data frame? So that Spark only executes all previous statements (i guess reading the file, uncompressing it etc) when running the count?

arald · ‎12-22-2018

Hi Rajeswaran, I guess you are just using Ambari? Or have you implemented some own Python code anywhere? Can you perhaps post what some details on what action you are trying to execute? Regards Harald

arald · ‎12-22-2018

HI Ajay, here is a sizing guide, which seems to address exactly your questions: https://community.hortonworks.com/articles/135337/nifi-sizing-guide-deployment-best-practices.html Still i personally wouldn't start with 8Gb RAM per node but at least with 16GB (2 GB per core). Anyway you will have to be clear on the throughput needed (Gb/sec.), not only on the overall volume. Regards Harald

arald · ‎12-22-2018

Can you perhaps also let us know how you try to read the file and the hive table? Also where is the file stored?

Online	Offline
Last Visited	‎08-19-2019 03:23 AM

Member Since	‎06-28-2017 06:04 AM
Last Visited	‎08-19-2019 03:23 AM
Posts	279
Kudos received	43

Cloudera Community

Re: secured nifi cluster must import a cert to bro...

Re: Nifi Epoch conversion not working?

Re: Scenario when we store data in HBase and acce...

Re: Setup environment variables in NiFi cluster se...

Re: CREATE EXTERNAL HIVE TABLE on existing HBASE T...

Re: How to import xml data from oracle to hive usi...

Re: Nifi Epoch conversion not working?

Re: secured nifi cluster must import a cert to bro...

Re: Nifi Epoch conversion not working?

Re: Unable to copy hdfs files to hdp sandbox docke...

Re: Unable to read NTFS windows shared drive path ...

Re: Spark - performance between hive table read an...

Re: Install and Test Kerberos client failed on Amb...

Re: How does NiFi handle large volume of data (e.g...

Re: Spark - performance between hive table read an...