Member since
05-02-2017
360
Posts
65
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13413 | 02-20-2018 12:33 PM | |
1524 | 02-19-2018 05:12 AM | |
1872 | 12-28-2017 06:13 AM | |
7169 | 09-28-2017 09:25 AM | |
12211 | 09-25-2017 11:19 AM |
08-10-2017
09:10 AM
@HEMANTH KUMAR RATAKONDA Spark configuration was not pointing to the right hadoop Configuration directory.
Point the value of HADOOP_CONF_DIR under spark-env.sh in spark. If spark does not points to proper hadoop configuration directory it might results in similar error.
... View more
08-10-2017
07:17 AM
2 Kudos
Hi @pavan p When you changing the block size from one value to other then only the files which are ingested/created in HDFS will be created with new block size. Where as the old files will remain to exists in the previous block size only and it will not changed. If you need to change then manual intervention is needed. Hope it Helps!
... View more
08-10-2017
07:01 AM
@Hadoop User Could you help with the complete log. It will be easier to sort it out if the complete log file is available.
... View more
08-10-2017
06:56 AM
@Greg Keys Thanks Greg. I got the information which I was looking for.
... View more
08-09-2017
02:16 PM
@Greg Keys Thanks for the links. I understand that hadoop clients should be installed in edge nodes, but is that the only use of edge nodes? It was suggested that the clients like sqoop should be on edge nodes as the data transfer rate will be very high. If the clients are in name/data nodes then it might affect the node performances. What Im not understanding is, if there is no intermediate staging taking place in edge node then is there any other tasks being performed in the edge nodes when the data is transferred from the external sources?
... View more
08-09-2017
08:31 AM
1 Kudo
Hadoop clients like sqoop,hive are installed in Edge nodes. A sqoop job is triggered which captures historical the data from RDBMS residing in a different server and not mounted. Now my questions are: 1) Edge nodes will act a gateway for any external sources for this job. Does the data passes through Edge node to the data nodes. Does this mean that an intermediate staging layer will be created in the Edge node which captures data from RDBMS? Will there be any difference if the file is transferred from external source rather than RDBMS?(Considering the file is not splitted in the source) 2) If the data's are stored in edge nodes as staging layer then what happens when source data is too huge that cant be stored in edge nodes? I know its open ended question but If you could help with few points that would be helpful.
... View more
Labels:
- Labels:
-
Apache Hadoop
08-08-2017
04:20 PM
@Aaron Dunlap I would try to do this hierarchical problem in a different way. We have array datatype in Hive. Lets make use of it rather than having a temp table where we could store the hierarchy data. Perform a group by operation, then capture and convert the hierarchy field which you want into array data type. If you need to select any specific level like 1st or 2nd then split the array field based on your need. Hope it helps!!
... View more
07-31-2017
01:38 PM
@Umair Majeed Could you explain in brief how did you load the external table? Also did you took count(*) to check the record count or it by through the stats available in the table properties? Whats is the delimited , rows seperator used in the external table? There are chances that the rows from the source query may be splitted into multiple records in the external tables. Check for the column delimiter and rows seperator. It might solve your issue.
... View more
07-31-2017
11:29 AM
hadoop fs -ls -R / would get the list of directories and its sub directories. Save it in a file and read it line by line using shell commands and pass it as a variable to hadoop fs -count -v -q $linefrompreviouscommand. This would work.
... View more
07-31-2017
11:25 AM
Hi @Lester Martin Not sure whether we have a single command to get the quotas for all the directory. But I would try to get all the HDFS directories and iterate it through a shell script which get the directories list from HDFS and append it in a file or we could we even print it on the screen also.
... View more