About balavignesh_nag

balavignesh_nag · ‎08-10-2017

@HEMANTH KUMAR RATAKONDA Spark configuration was not pointing to the right hadoop Configuration directory. Point the value of HADOOP_CONF_DIR under spark-env.sh in spark. If spark does not points to proper hadoop configuration directory it might results in similar error.

balavignesh_nag · ‎08-10-2017

Hi @pavan p When you changing the block size from one value to other then only the files which are ingested/created in HDFS will be created with new block size. Where as the old files will remain to exists in the previous block size only and it will not changed. If you need to change then manual intervention is needed. Hope it Helps!

balavignesh_nag · ‎08-10-2017

@Hadoop User Could you help with the complete log. It will be easier to sort it out if the complete log file is available.

balavignesh_nag · ‎08-10-2017

@Greg Keys Thanks Greg. I got the information which I was looking for.

balavignesh_nag · ‎08-09-2017

@Greg Keys Thanks for the links. I understand that hadoop clients should be installed in edge nodes, but is that the only use of edge nodes? It was suggested that the clients like sqoop should be on edge nodes as the data transfer rate will be very high. If the clients are in name/data nodes then it might affect the node performances. What Im not understanding is, if there is no intermediate staging taking place in edge node then is there any other tasks being performed in the edge nodes when the data is transferred from the external sources?

balavignesh_nag · ‎08-09-2017

Hadoop clients like sqoop,hive are installed in Edge nodes. A sqoop job is triggered which captures historical the data from RDBMS residing in a different server and not mounted. Now my questions are: 1) Edge nodes will act a gateway for any external sources for this job. Does the data passes through Edge node to the data nodes. Does this mean that an intermediate staging layer will be created in the Edge node which captures data from RDBMS? Will there be any difference if the file is transferred from external source rather than RDBMS?(Considering the file is not splitted in the source) 2) If the data's are stored in edge nodes as staging layer then what happens when source data is too huge that cant be stored in edge nodes? I know its open ended question but If you could help with few points that would be helpful.

balavignesh_nag · ‎08-08-2017

@Aaron Dunlap I would try to do this hierarchical problem in a different way. We have array datatype in Hive. Lets make use of it rather than having a temp table where we could store the hierarchy data. Perform a group by operation, then capture and convert the hierarchy field which you want into array data type. If you need to select any specific level like 1st or 2nd then split the array field based on your need. Hope it helps!!

balavignesh_nag · ‎07-31-2017

@Umair Majeed Could you explain in brief how did you load the external table? Also did you took count(*) to check the record count or it by through the stats available in the table properties? Whats is the delimited , rows seperator used in the external table? There are chances that the rows from the source query may be splitted into multiple records in the external tables. Check for the column delimiter and rows seperator. It might solve your issue.

balavignesh_nag · ‎07-31-2017

hadoop fs -ls -R / would get the list of directories and its sub directories. Save it in a file and read it line by line using shell commands and pass it as a variable to hadoop fs -count -v -q $linefrompreviouscommand. This would work.

balavignesh_nag · ‎07-31-2017

Hi @Lester Martin Not sure whether we have a single command to get the quotas for all the directory. But I would try to get all the HDFS directories and iterate it through a shell script which get the directories list from HDFS and append it in a file or we could we even print it on the screen also.

Online	Offline
Last Visited	‎10-03-2019 09:01 AM

Member Since	‎05-02-2017 01:47 PM
Last Visited	‎10-03-2019 09:01 AM
Posts	360
Kudos received	64

Cloudera Community

Re: what is the best way to get ftp file to hdfs c...

Re: when yarn communicates with the namenodes when...

Re: [TEZ] are partition, sort and shuffle built-in...

Re: CASE statement Error in Beeline HIVE

Re: hive query to display Week of the timestamp an...

Re: Unable to query in spark

Re: consequences of changing block size?

Re: deleting files after ingesting data

Re: Does data get copied in edge node from externa...

Re: Does data get copied in edge node from externa...

Does data get copied in edge node from external so...

Re: Recursive query or better way to build hierarc...

Re: Hive Table loads less rows than the actual row...

Re: Can you list ALL of HDFS quotas with a single ...

Re: Can you list ALL of HDFS quotas with a single ...