Member since
06-20-2016
251
Posts
196
Kudos Received
36
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9381 | 11-08-2017 02:53 PM | |
1951 | 08-24-2017 03:09 PM | |
7548 | 05-11-2017 02:55 PM | |
5981 | 05-08-2017 04:16 PM | |
1831 | 04-27-2017 08:05 PM |
06-28-2016
03:31 PM
@Ravikumar Kumashi
Make sure your VM is up and sshd is running and listening on port 2222: sudo netstat -anp | grep sshd Make sure no firewall rules are getting in the way. If confirmed, try using 127.0.0.1 instead of localhost and if that doesn't work try editing your hosts file so that sandbox.hortonworks.com resolves to 127.0.0.1 and then use the FQDN sandbox.hortonworks.com instead of localhost.
... View more
06-28-2016
03:07 PM
@Ravikumar Kumashi in Cygwin, you can access the root of your C: drive by specifying the directory /cygdrive/c. So the path would be /cygdrive/c/Users/rnkumashi/Downloads/sample.txt This is one of the reasons I recommended pscp.
... View more
06-28-2016
02:42 PM
@mayki wogno It's essentially non-HDFS data in dfs.datanode.data.dir. This could include log files, intermediate shuffle output from MapReduce jobs, local data files (if you put them on a data node), etc. You can use du or a similar tool to investigate further.
... View more
06-28-2016
02:34 PM
@mayki wogno "Non DFS used" can be calculated by the following formula: Non DFS Used = Configured Capacity - Remaining Space - DFS Used Noting that Configured Capacity = Total Disk Space - Reserved Space Therefore, Non DFS Used = (Total Disk Space - Reserved Space) - Remaining Space - DFS Used Reserved Space is set by the property dfs.datanode.du.reserved
... View more
06-28-2016
02:14 PM
1 Kudo
I would recommend looking into pscp if you are on a Windows platform. If using Cygwin, you'll need to install scp, see http://stackoverflow.com/questions/18688502/how-do-i-download-scp-and-ssh-on-cygwin. scp is part of the openssh package as noted.
... View more
06-28-2016
02:13 PM
1 Kudo
@alain TSAFACK val VAL1 = "testcol"
val df = HiveContext.sql(s"SELECT * FROM src WHERE col1 = $VAL1")
... View more
06-28-2016
01:55 PM
@Ravikumar Kumashi yes, that is correct, you want to run the command from your local machine since that is where the file lives that you are scp'ing over to the sandbox. You can invoke via Cygwin or you can use pscp (from the makers of Putty) and run pscp from the Windows command line.
... View more
06-28-2016
01:39 PM
@Simran Kaur an example of using Hive for data cleansing is in this article (see section 3.5 in particular). Regarding Spark, it is used widely for extract, transformation, and load logic and is usually well-suited for those kinds of use cases. Both MapReduce and Spark are very general computation paradigms. It would be helpful to know what data cleaning transformations you have in mind.
... View more
06-28-2016
01:23 PM
@Ravikumar Kumashi the scp command is missing the port number (please notice that the "usage" text was returned by the command, which means the syntax was incorrect). Please try specifying 2222 after the -P switch.
... View more
06-27-2016
06:32 PM
@Bharath Kumar K you may want to look into pscp if you want to run from the command line and resolve network drive mappings in the Windows fashion. I am not sure how you are running #1 (from Cygwin maybe?), but the syntax in the first example is essentially correct. #3 should work, what error are you receiving? One thing you might want to try is creating a hosts entry so that sandbox.hortonworks.com resolves to 127.0.0.1 and then using sandbox.hortonworks.com as your hostname/IP.
... View more
- « Previous
- Next »