Member since
05-07-2018
331
Posts
45
Kudos Received
35
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 9640 | 09-12-2018 10:09 PM | |
| 3765 | 09-10-2018 02:07 PM | |
| 11554 | 09-08-2018 05:47 AM | |
| 4099 | 09-08-2018 12:05 AM | |
| 4943 | 08-15-2018 10:44 PM |
06-24-2018
06:09 AM
Hi @Satish Anjaneyappa! Did you try to run analyze table? hive> analyze table salaries compute statistics;
Query ID = hive_20180624055914_ad06fa7a-ae16-4658-a4fb-5eabc1c2425a
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1529818303832_0002)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 4.54 s
--------------------------------------------------------------------------------
Table default.salaries stats: [numFiles=1, numRows=50, totalSize=781, rawDataSize=732]
OK
Time taken: 5.64 seconds
And I have a few questions about your issue: - What kinda table we're talking? External or Managed? Could you share with us the describe formatted output from your table? - Are you running hive on tez or mr? And is there any specific file format for this table? More details about the analyze on the link below 🙂 https://cwiki.apache.org/confluence/display/Hive/StatsDev Hope this helps!
... View more
06-22-2018
05:29 PM
Hi @VISHAL SINGH! I'm not a specialist in Kafka, but I guess you're hitting the https://issues.apache.org/jira/browse/KAFKA-4073 As you can see on the link below, it's passing an invalid timestamp to the ProducerRecord class https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/ProducerRecord.java#L67 If you're able to test a workaround, can try to change the message.timestamp.type from topics config. Here's mine e.g.: [root@node3 bin]# ./kafka-configs.sh --zookeeper node1:2181,node2:2181,node3:2181 --entity-type topics --entity-name vini --describe
Configs for topic 'vini' are
[root@node3 bin]# ./kafka-configs.sh --zookeeper node1:2181,node2:2181,node3:2181 --entity-type topics --entity-name vini --alter --add-config message.timestamp.type=LogAppendTime
Completed Updating config for entity: topic 'vini'.
[root@node3 bin]# ./kafka-configs.sh --zookeeper node1:2181,node2:2181,node3:2181 --entity-type topics --entity-name vini --describe
Configs for topic 'vini' are message.timestamp.type=LogAppendTime<br>
Details on the link below: https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message PS: I'd test this on a non-production topic, so you can rollback the config whenever you want :)
Hope this helps! 🙂
... View more
06-22-2018
04:11 PM
Hey @Hugo Almeida! Hm that's totally strange 😞 Let's attack orc behaviour, could you check your orc version? Here's mine. /usr/hdp/2.6.5.0-292/hive2/lib/hive-orc-2.1.0.2.6.5.0-292.jar BTW, I'm just concerned if this issue is getting you stuck on your job. If so, you can try to change your datatype to String (besides it's more recommended). ah, one last thing, are you using hive on tez or hive on mr? Coz I did my test with hive on tez.. Hope this helps
... View more
06-21-2018
09:36 PM
1 Kudo
Hey @Abhinav Phutela ! Are you able to telnet the 50070 or other Datanode ports of those hosts (hdfs hosts)? I made a test here, you can try to use as an example: Content of GetFile Processor Content of the PutHDFS My flow above will send the hdfs_nifi.txt on the Nifi host and send to HDFS host. #First on the HDFS side, need to send the *-site.xml config from the hadoop-client/conf dir to Nifi host
[root@hdfs_node1 conf]$ scp core-site.xml hdfs-site.xml root@nifi_node1:/home/nifi #Then I'll spread the *-site.xml config to my other nifi instances and create the file to put on HDFS [nifi@nifi_node1 ~]$ echo "testing hdfs content" > hdfs_nifi.txt
[nifi@nifi_node1 ~]$ sudo chown nifi:hadoop *.xml
[nifi@nifi_node1 ~]$ scp core-site.xml hdfs-site.xml nifi@nifi_node2:/home/nifi
[nifi@nifi_node1 ~]$ scp core-site.xml hdfs-site.xml nifi@nifi_node3:/home/nifi #After i start the job, got the succesfully data on HDFS [root@hdfs_node1 conf]$ hdfs dfs -cat /tmp/hdfs_nifi.txt
testing hdfs content Ps: I also had changed my /etc/hosts to known the hosts from HDFS cluster. Hope this helps!
... View more
06-21-2018
07:02 PM
Hi @Karthik Chandrashekhar! Sorry about my delay, so taking a look at your du outputs, it looks like HDFS is doing okay with the DFS total size. If you sum the values from 16 hosts under the /hadoop/hadoop/hdfs/data it will be equal to 1.7TB. Do you have a specific mount disk for /hadoop/hadoop/hdfs/data? Or it's all under the / directory, and how many disks do you have? E.g., in my case, I have a lab and its everything under the / directory in 1 disk. [root@c1123-node3 hadoop]# df -h
Filesystem Size Used Avail Use% Mounted on
rootfs 1.2T 731G 423G 64% /
overlay 1.2T 731G 423G 64% /
tmpfs 126G 0 126G 0% /dev
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/mapper/vg01-vsr_lib_docker
1.2T 731G 423G 64% /etc/resolv.conf
/dev/mapper/vg01-vsr_lib_docker
1.2T 731G 423G 64% /etc/hostname
/dev/mapper/vg01-vsr_lib_docker
1.2T 731G 423G 64% /etc/hosts
shm 64M 12K 64M 1% /dev/shm
overlay 1.2T 731G 423G 64% /proc/meminfo If I calculate on all hosts the du --max-depth=1-h /hadoop/hdfs/data, I'll get my DFS Usage. And if I calculate on all hosts my du --max-depth=1-h / minus the value from HDFS directory, I'll get the total of Non-dfs usage. So the math would be: DFS Usage = Total DU on the HDFS Path NON-DFS Usage = Total DU - (DFS Usage) For each disk. And answering your last question, the finalized folder it's used by HDFS to allocated the blocks that have been processed. So deleting these files, it'll probably throw some alerts from HDFS to you (maybe a block missing a replica or some corrupted block). I completely understand your concern about your storage getting almost full, but, If you aren't able to delete any data outside of the HDFS, I'd try to delete old and unused files from HDFS (using the HDFS DFS command!), compress any raw data, use more file formats with compression enabled or in last case change the replication-factor to a lower value (kindly remember that changing this, it may cause some problems). Just a friendly reminder, everything under the dfs.datanode.data.dir will be used internally for HDFS storing purposes 🙂 Hope this helps!
... View more
06-21-2018
06:15 PM
1 Kudo
Ola @Hugo Almeida ! Hm, gotcha. Yeah, our output from describe extended is barely the same. Could you try to execute the following cmd, plz? [hive@node3 ~]$ hive --orcfiledump hdfs://Admin-TrainingNS/apps/hive/warehouse/special_char/000000_0
Processing data file hdfs://Admin-TrainingNS/apps/hive/warehouse/special_char/000000_0 [length: 262]
Structure for hdfs://Admin-TrainingNS/apps/hive/warehouse/special_char/000000_0
File Version: 0.12 with HIVE_13083
18/06/21 18:10:50 INFO orc.ReaderImpl: Reading ORC rows from hdfs://Admin-TrainingNS/apps/hive/warehouse/special_char/000000_0 with {include: null, offset: 0, length: 9223372036854775807}
18/06/21 18:10:50 INFO orc.RecordReaderImpl: Reader schema not provided -- using file schema struct<_col0:varchar(11)>
Rows: 1
Compression: ZLIB
Compression size: 262144
Type: struct<_col0:varchar(11)>
Stripe Statistics:
Stripe 1:
Column 0: count: 1 hasNull: false
Column 1: count: 1 hasNull: false min: 1ºTrimestre max: 1ºTrimestre sum: 12
File Statistics:
Column 0: count: 1 hasNull: false
Column 1: count: 1 hasNull: false min: 1ºTrimestre max: 1ºTrimestre sum: 12
Stripes:
Stripe: offset: 3 data: 21 rows: 1 tail: 46 index: 49
Stream: column 0 section ROW_INDEX start: 3 length 11
Stream: column 1 section ROW_INDEX start: 14 length 38
Stream: column 1 section DATA start: 52 length 15
Stream: column 1 section LENGTH start: 67 length 6
Encoding column 0: DIRECT
Encoding column 1: DIRECT_V2
File length: 262 bytes
Padding length: 0 bytes
Padding ratio: 0%
_____________________ Hope this helps!
... View more
06-21-2018
12:47 AM
1 Kudo
Hi @Hugo Almeida! Ahh glad to hear that! I'm from Brazil, but living in Chile hehe 🙂 So regarding your hive issue, we're running almost the same version of hive: [root@node3 ~]# hive --version Hive 1.2.1000.2.6.5.0-292 Could you recreate the orc table with a bigger varchar? Like 12? My guess would be that somehow your special character is consuming more than it should for ORC files. Anyway, I also made other tests against this special char: hive> select md5('º') from special_char;
OK
91f9bcb8e28b5cb34d63c456efe3a29c
Time taken: 0.077 seconds, Fetched: 1 row(s)
hive> select md5('º') from special_char_text;;
OK
91f9bcb8e28b5cb34d63c456efe3a29c
Time taken: 0.064 seconds, Fetched: 1 row(s)
hive> select ascii('º') from special_char_text;
OK
-62
Time taken: 0.07 seconds, Fetched: 1 row(s)
hive> select ascii('º') from special_char;
OK
-62
Time taken: 0.072 seconds, Fetched: 1 row(s) Hope this helps!
... View more
06-20-2018
06:18 PM
Hi @Suresh Dendukuri! Glad to hear that was helpful 🙂 So, backing to your new problem, i'd kindly ask to you, to open a new question in HCC (cause separating different issues helps other HCC user to search for a specific problem) 🙂 But, just lemme get better into your problem, the query listed isn't working? If so, does Nifi showing error? Or just the result isn't the expected? Thanks
... View more
06-20-2018
04:43 PM
Hi @Mateusz Grabowski! Could you enable the DEBUG on logs? I'm looking for a specific error msg https://github.com/apache/sqoop/blob/3233db8e1c481e38c538f4caaf55bcbc0c11f208/src/java/org/apache/sqoop/tool/BaseSqoopTool.java#L1203 https://github.com/apache/sqoop/blob/0ca73d4e71bf4724cd7dd15faa108e6ee56ee121/src/java/org/apache/sqoop/util/password/CredentialProviderHelper.java#L101 Not sure if you'll be able to do this with sqoop export, i didn't saw anything about credentials+password-alias to export mode on the documentation, but let's investigate it further 🙂 Hope this helps
... View more
06-20-2018
03:00 AM
Hi @Anji Raju! Hmm, guess there's a little almost invisible mistake in your xpath 🙂 Try to change your ReturnPayLoad "column.xpath.ReturnPayLoad" = "/FormServerResponse/ReturnPaylLoad/ACORD/SignonRq" to ReturnPayload "column.xpath.ReturnPayLoad" = "/FormServerResponse/ReturnPayload/ACORD/SignonRq" Hope this helps!
... View more