Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Read csv files from HDFS into R using fread() and grep — lost column names

Solved Go to solution

Read csv files from HDFS into R using fread() and grep — lost column names

I've been trying to read large csv files from HDFS into R using the data.table package since it's a lot faster than the rhdfs package in my experience.

I have been successful with reading entire files with the following commands:

data <- fread("/usr/bin/hadoop fs -text /path/to/the/file.csv"), fill=TRUE

Then, I would like to only read in rows that contain the value "2MS-US". I tried to do it with grep:

data <- fread("/usr/bin/hadoop fs -text /path/to/the/file.csv | grep '2MS-US'"), fill=TRUE)

This returns the correct number of rows, but it removes all the headers. They now become "V1", "V2", etc.

According to this thread, the issue with losing column names when using grep has been resolved in data.package 1.9.6, but I am still experiencing it even though I am using 1.9.7. Any thoughts on this? Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Read csv files from HDFS into R using fread() and grep — lost column names

Fixed the issue by using sed instead:

fread("hadoop fs -text /path/to/the/file.csv |sed -n '1p;/2MS-US/p'", fill=TRUE)

The 1p part prints the first line, which are the headers, so this way I was able to keep the headers as well as the rows that match the string.

View solution in original post

1 REPLY 1
Highlighted

Re: Read csv files from HDFS into R using fread() and grep — lost column names

Fixed the issue by using sed instead:

fread("hadoop fs -text /path/to/the/file.csv |sed -n '1p;/2MS-US/p'", fill=TRUE)

The 1p part prints the first line, which are the headers, so this way I was able to keep the headers as well as the rows that match the string.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here