About vmurakami

vmurakami · ‎06-24-2018

Hi @Satish Anjaneyappa! Did you try to run analyze table? hive> analyze table salaries compute statistics; Query ID = hive_20180624055914_ad06fa7a-ae16-4658-a4fb-5eabc1c2425a Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1529818303832_0002) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 4.54 s -------------------------------------------------------------------------------- Table default.salaries stats: [numFiles=1, numRows=50, totalSize=781, rawDataSize=732] OK Time taken: 5.64 seconds And I have a few questions about your issue: - What kinda table we're talking? External or Managed? Could you share with us the describe formatted output from your table? - Are you running hive on tez or mr? And is there any specific file format for this table? More details about the analyze on the link below 🙂 https://cwiki.apache.org/confluence/display/Hive/StatsDev Hope this helps!

vmurakami · ‎06-22-2018

Hi @VISHAL SINGH! I'm not a specialist in Kafka, but I guess you're hitting the https://issues.apache.org/jira/browse/KAFKA-4073 As you can see on the link below, it's passing an invalid timestamp to the ProducerRecord class https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/ProducerRecord.java#L67 If you're able to test a workaround, can try to change the message.timestamp.type from topics config. Here's mine e.g.: [root@node3 bin]# ./kafka-configs.sh --zookeeper node1:2181,node2:2181,node3:2181 --entity-type topics --entity-name vini --describe Configs for topic 'vini' are [root@node3 bin]# ./kafka-configs.sh --zookeeper node1:2181,node2:2181,node3:2181 --entity-type topics --entity-name vini --alter --add-config message.timestamp.type=LogAppendTime Completed Updating config for entity: topic 'vini'. [root@node3 bin]# ./kafka-configs.sh --zookeeper node1:2181,node2:2181,node3:2181 --entity-type topics --entity-name vini --describe Configs for topic 'vini' are message.timestamp.type=LogAppendTime<br> Details on the link below: https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message PS: I'd test this on a non-production topic, so you can rollback the config whenever you want :) Hope this helps! 🙂

vmurakami · ‎06-22-2018

Hey @Hugo Almeida! Hm that's totally strange 😞 Let's attack orc behaviour, could you check your orc version? Here's mine. /usr/hdp/2.6.5.0-292/hive2/lib/hive-orc-2.1.0.2.6.5.0-292.jar BTW, I'm just concerned if this issue is getting you stuck on your job. If so, you can try to change your datatype to String (besides it's more recommended). ah, one last thing, are you using hive on tez or hive on mr? Coz I did my test with hive on tez.. Hope this helps

vmurakami · ‎06-21-2018

Hey @Abhinav Phutela ! Are you able to telnet the 50070 or other Datanode ports of those hosts (hdfs hosts)? I made a test here, you can try to use as an example: Content of GetFile Processor Content of the PutHDFS My flow above will send the hdfs_nifi.txt on the Nifi host and send to HDFS host. #First on the HDFS side, need to send the *-site.xml config from the hadoop-client/conf dir to Nifi host [root@hdfs_node1 conf]$ scp core-site.xml hdfs-site.xml root@nifi_node1:/home/nifi #Then I'll spread the *-site.xml config to my other nifi instances and create the file to put on HDFS [nifi@nifi_node1 ~]$ echo "testing hdfs content" > hdfs_nifi.txt [nifi@nifi_node1 ~]$ sudo chown nifi:hadoop *.xml [nifi@nifi_node1 ~]$ scp core-site.xml hdfs-site.xml nifi@nifi_node2:/home/nifi [nifi@nifi_node1 ~]$ scp core-site.xml hdfs-site.xml nifi@nifi_node3:/home/nifi #After i start the job, got the succesfully data on HDFS [root@hdfs_node1 conf]$ hdfs dfs -cat /tmp/hdfs_nifi.txt testing hdfs content Ps: I also had changed my /etc/hosts to known the hosts from HDFS cluster. Hope this helps!

vmurakami · ‎06-21-2018

Hi @Karthik Chandrashekhar! Sorry about my delay, so taking a look at your du outputs, it looks like HDFS is doing okay with the DFS total size. If you sum the values from 16 hosts under the /hadoop/hadoop/hdfs/data it will be equal to 1.7TB. Do you have a specific mount disk for /hadoop/hadoop/hdfs/data? Or it's all under the / directory, and how many disks do you have? E.g., in my case, I have a lab and its everything under the / directory in 1 disk. [root@c1123-node3 hadoop]# df -h Filesystem Size Used Avail Use% Mounted on rootfs 1.2T 731G 423G 64% / overlay 1.2T 731G 423G 64% / tmpfs 126G 0 126G 0% /dev tmpfs 126G 0 126G 0% /sys/fs/cgroup /dev/mapper/vg01-vsr_lib_docker 1.2T 731G 423G 64% /etc/resolv.conf /dev/mapper/vg01-vsr_lib_docker 1.2T 731G 423G 64% /etc/hostname /dev/mapper/vg01-vsr_lib_docker 1.2T 731G 423G 64% /etc/hosts shm 64M 12K 64M 1% /dev/shm overlay 1.2T 731G 423G 64% /proc/meminfo If I calculate on all hosts the du --max-depth=1-h /hadoop/hdfs/data, I'll get my DFS Usage. And if I calculate on all hosts my du --max-depth=1-h / minus the value from HDFS directory, I'll get the total of Non-dfs usage. So the math would be: DFS Usage = Total DU on the HDFS Path NON-DFS Usage = Total DU - (DFS Usage) For each disk. And answering your last question, the finalized folder it's used by HDFS to allocated the blocks that have been processed. So deleting these files, it'll probably throw some alerts from HDFS to you (maybe a block missing a replica or some corrupted block). I completely understand your concern about your storage getting almost full, but, If you aren't able to delete any data outside of the HDFS, I'd try to delete old and unused files from HDFS (using the HDFS DFS command!), compress any raw data, use more file formats with compression enabled or in last case change the replication-factor to a lower value (kindly remember that changing this, it may cause some problems). Just a friendly reminder, everything under the dfs.datanode.data.dir will be used internally for HDFS storing purposes 🙂 Hope this helps!

vmurakami · ‎06-21-2018

Ola @Hugo Almeida ! Hm, gotcha. Yeah, our output from describe extended is barely the same. Could you try to execute the following cmd, plz? [hive@node3 ~]$ hive --orcfiledump hdfs://Admin-TrainingNS/apps/hive/warehouse/special_char/000000_0 Processing data file hdfs://Admin-TrainingNS/apps/hive/warehouse/special_char/000000_0 [length: 262] Structure for hdfs://Admin-TrainingNS/apps/hive/warehouse/special_char/000000_0 File Version: 0.12 with HIVE_13083 18/06/21 18:10:50 INFO orc.ReaderImpl: Reading ORC rows from hdfs://Admin-TrainingNS/apps/hive/warehouse/special_char/000000_0 with {include: null, offset: 0, length: 9223372036854775807} 18/06/21 18:10:50 INFO orc.RecordReaderImpl: Reader schema not provided -- using file schema struct<_col0:varchar(11)> Rows: 1 Compression: ZLIB Compression size: 262144 Type: struct<_col0:varchar(11)> Stripe Statistics: Stripe 1: Column 0: count: 1 hasNull: false Column 1: count: 1 hasNull: false min: 1ºTrimestre max: 1ºTrimestre sum: 12 File Statistics: Column 0: count: 1 hasNull: false Column 1: count: 1 hasNull: false min: 1ºTrimestre max: 1ºTrimestre sum: 12 Stripes: Stripe: offset: 3 data: 21 rows: 1 tail: 46 index: 49 Stream: column 0 section ROW_INDEX start: 3 length 11 Stream: column 1 section ROW_INDEX start: 14 length 38 Stream: column 1 section DATA start: 52 length 15 Stream: column 1 section LENGTH start: 67 length 6 Encoding column 0: DIRECT Encoding column 1: DIRECT_V2 File length: 262 bytes Padding length: 0 bytes Padding ratio: 0% _____________________ Hope this helps!

vmurakami · ‎06-21-2018

Hi @Hugo Almeida! Ahh glad to hear that! I'm from Brazil, but living in Chile hehe 🙂 So regarding your hive issue, we're running almost the same version of hive: [root@node3 ~]# hive --version Hive 1.2.1000.2.6.5.0-292 Could you recreate the orc table with a bigger varchar? Like 12? My guess would be that somehow your special character is consuming more than it should for ORC files. Anyway, I also made other tests against this special char: hive> select md5('º') from special_char; OK 91f9bcb8e28b5cb34d63c456efe3a29c Time taken: 0.077 seconds, Fetched: 1 row(s) hive> select md5('º') from special_char_text;; OK 91f9bcb8e28b5cb34d63c456efe3a29c Time taken: 0.064 seconds, Fetched: 1 row(s) hive> select ascii('º') from special_char_text; OK -62 Time taken: 0.07 seconds, Fetched: 1 row(s) hive> select ascii('º') from special_char; OK -62 Time taken: 0.072 seconds, Fetched: 1 row(s) Hope this helps!

vmurakami · ‎06-20-2018

Hi @Suresh Dendukuri! Glad to hear that was helpful 🙂 So, backing to your new problem, i'd kindly ask to you, to open a new question in HCC (cause separating different issues helps other HCC user to search for a specific problem) 🙂 But, just lemme get better into your problem, the query listed isn't working? If so, does Nifi showing error? Or just the result isn't the expected? Thanks

vmurakami · ‎06-20-2018

Hi @Mateusz Grabowski! Could you enable the DEBUG on logs? I'm looking for a specific error msg https://github.com/apache/sqoop/blob/3233db8e1c481e38c538f4caaf55bcbc0c11f208/src/java/org/apache/sqoop/tool/BaseSqoopTool.java#L1203 https://github.com/apache/sqoop/blob/0ca73d4e71bf4724cd7dd15faa108e6ee56ee121/src/java/org/apache/sqoop/util/password/CredentialProviderHelper.java#L101 Not sure if you'll be able to do this with sqoop export, i didn't saw anything about credentials+password-alias to export mode on the documentation, but let's investigate it further 🙂 Hope this helps

vmurakami · ‎06-20-2018

Hi @Anji Raju! Hmm, guess there's a little almost invisible mistake in your xpath 🙂 Try to change your ReturnPayLoad "column.xpath.ReturnPayLoad" = "/FormServerResponse/ReturnPaylLoad/ACORD/SignonRq" to ReturnPayload "column.xpath.ReturnPayLoad" = "/FormServerResponse/ReturnPayload/ACORD/SignonRq" Hope this helps!

Online	Offline
Last Visited	‎12-23-2018 04:33 AM

Member Since	‎05-07-2018 06:05 PM
Last Visited	‎12-23-2018 04:33 AM
Posts	331
Kudos received	45

Cloudera Community

Re: Minifi not connecting to Nifi - remote instanc...

Re: getsnmp attribute

Re: XML and Hive parsing error with Serde.

Re: Ranger and HDFS over SSL

Re: livy2 zepplin issue

Re: Hive explain plan fetching wrong row numbers

Re: Kafka (0.10.0.1) Mirror Maker is not working, ...

Re: string using VARCHAR type is silently truncate...

Re: NiFi: Hadoop Configuration Error

Re: Non-DFS storage occupied in Hadoop mount in Li...

Re: string using VARCHAR type is silently truncate...

Re: string using VARCHAR type is silently truncate...

Re: Please help me to get count of particular fie...

Re: Is it possible to use jceks file with Sqoop Ex...

Re: Getting multiple records while loading nested ...