Member since
05-19-2016
93
Posts
17
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1368 | 01-30-2017 07:34 AM | |
831 | 09-14-2016 10:31 AM |
10-04-2016
05:59 AM
1 Kudo
I am planning for HCA - HORTONWORKS CERTIFIED ASSOCIATE Certification in next 2 months. I have already checked the exam objective and it is vast. I am reading below book for certification Apress.Pro.Apache.Hadoop.2nd.Edition Is this sufficient for passing the certification ? Can anyone tell me how to prepare for this certification ? Can someone share sample question being asked in this HCA certification?
... View more
- Tags:
- hadoop
- Hadoop Core
Labels:
09-27-2016
03:19 PM
@mqureshi Thanks for your reply. When I am trying with lzo it is throwing below error 16/09/27 20:45:04 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException: Format parquet doesn't support compression format lzo
org.kitesdk.data.ValidationException: Format parquet doesn't support compression format lzo But for lzop I am not getting any error. But output not as expected.
... View more
09-27-2016
02:40 PM
1 Kudo
Let's say I have one input file named input.txt and the content of the file is given below Hadoop is good
Hortonworks makes the life easy
Hadoop is a framework This input.txt contains only 3 lines . Now I need the count of the word "Hortonworks" and line number of occurrence in the input file . For this input file the count of "Hortonworks" is 1 and it is in the line number 2. I can find it running separate MapReduce job for each query. Can we find both queries output in one MapReduce Job ? I do not want to run two separate job for this purpose. It will be IO heat for billions of data.
... View more
Labels:
09-27-2016
02:21 PM
I am trying to move data from oracle to HDFS. File format is parquet and compression is lzo .Below sqoop command not working as expected sqoop import --connect jdbc:oracle:thin:@**.***.***.***:1521:**** --username **** --password **** --table MyTable -m 1 --target-dir /user/aps/test105 --fields-terminated-by '|' --as-parquetfile --compress --compression-codec lzop Below is the log 16/09/27 19:42:56 INFO mapreduce.Job: Running job: job_1474971595874_0032
16/09/27 19:43:04 INFO mapreduce.Job: Job job_1474971595874_0032 running in uber mode : false
16/09/27 19:43:04 INFO mapreduce.Job: map 0% reduce 0%
16/09/27 19:43:11 INFO mapreduce.Job: map 100% reduce 0%
16/09/27 19:43:11 INFO mapreduce.Job: Job job_1474971595874_0032 completed successfully
16/09/27 19:43:11 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=163431
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=6342
HDFS: Number of bytes written=2716
HDFS: Number of read operations=50
HDFS: Number of large read operations=0
HDFS: Number of write operations=9
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=4340
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4340
Total vcore-seconds taken by all map tasks=4340
Total megabyte-seconds taken by all map tasks=8888320
Map-Reduce Framework
Map input records=25
Map output records=25
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=101
CPU time spent (ms)=3600
Physical memory (bytes) snapshot=360005632
Virtual memory (bytes) snapshot=3532488704
Total committed heap usage (bytes)=299892736
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
16/09/27 19:43:11 INFO mapreduce.ImportJobBase: Transferred 2.6523 KB in 18.3874 seconds (147.7096 bytes/sec)
16/09/27 19:43:11 INFO mapreduce.ImportJobBase: Retrieved 25 records.
edwweb@ctsc00691239901:~/aps$ hadoop fs -ls /user/aps/test105
Found 2 items
drwxr-xr-x - ****** hdfs 0 2016-09-27 19:42 /user/aps/test105/.metadata
-rw-r--r-- 3 ****** hdfs 1279 2016-09-27 19:43 /user/aps/test105/53e90168-e46a-4404-a726-063c533e3db2.parquet Output is parquet file. It should be lzo file. Could you please help.
... View more
Labels:
09-26-2016
01:21 PM
1 Kudo
I am trying execute one MapReduce job in java but it is getting stuck at the middle and finally it was timed out. Below is the log 16/09/26 18:46:42 INFO mapreduce.Job: Running job: job_1474692614849_0070
16/09/26 18:46:50 INFO mapreduce.Job: Job job_1474692614849_0070 running in uber mode : false
16/09/26 18:46:50 INFO mapreduce.Job: map 0% reduce 0%
16/09/26 18:47:01 INFO mapreduce.Job: map 33% reduce 0%
16/09/26 18:52:19 INFO mapreduce.Job: Task Id : attempt_1474692614849_0070_m_000000_0, Status : FAILED
AttemptID:attempt_1474692614849_0070_m_000000_0 Timed out after 300 secs
16/09/26 18:52:20 INFO mapreduce.Job: map 0% reduce 0%
16/09/26 18:52:30 INFO mapreduce.Job: map 33% reduce 0%
16/09/26 18:57:49 INFO mapreduce.Job: Task Id : attempt_1474692614849_0070_m_000000_1, Status : FAILED
AttemptID:attempt_1474692614849_0070_m_000000_1 Timed out after 300 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
16/09/26 18:57:50 INFO mapreduce.Job: map 0% reduce 0%
16/09/26 18:57:59 INFO mapreduce.Job: map 33% reduce 0%
16/09/26 19:03:19 INFO mapreduce.Job: Task Id : attempt_1474692614849_0070_m_000000_2, Status : FAILED
AttemptID:attempt_1474692614849_0070_m_000000_2 Timed out after 300 secs
16/09/26 19:03:20 INFO mapreduce.Job: map 0% reduce 0%
16/09/26 19:03:31 INFO mapreduce.Job: map 33% reduce 0%
16/09/26 19:08:50 INFO mapreduce.Job: map 100% reduce 100%
16/09/26 19:08:50 INFO mapreduce.Job: Job job_1474692614849_0070 failed with state FAILED due to: Task failed task_1474692614849_0070_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/09/26 19:08:50 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=4
Killed reduce tasks=1
Launched map tasks=4
Other local map tasks=3
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=1311303
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1311303
Total time spent by all reduce tasks (ms)=0
Total vcore-seconds taken by all map tasks=1311303
Total vcore-seconds taken by all reduce tasks=0
Total megabyte-seconds taken by all map tasks=2685548544
Total megabyte-seconds taken by all reduce tasks=0
Is there any way to debug this mapreduce job ? Please help.
... View more
Labels:
09-23-2016
03:17 PM
@mqureshi Thanks for your quick response.
... View more
09-23-2016
03:05 PM
1 Kudo
I have installed Hortonworks Sandbox 2.4 in my system using virtual box . But Spark is not up and running in this sandbox. All other services are working .I have 8 GB RAM and Ubuntu 14.04 LS . Please help.
... View more
Labels:
09-23-2016
02:56 PM
1 Kudo
I am from java background and want to shift my carrier in Big Data world . Which one will give me the mileage - Scala or Python ?
... View more
09-21-2016
12:32 PM
Is there any update. Please help.
... View more
09-21-2016
12:09 PM
2 Kudos
@gkeys Add below options in the command -D mapreduce.output.fileoutputformat.compress=true
-D mapreduce.output.fileoutputformat.compress.type=BLOCK
-D mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec Below sqoop command is working perfectly for me sqoop import -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.type=BLOCK -Dmapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://***.***.***.***/DATABASE=****** --username ****** --password ****** --table mytable --target-dir /user/aps/test -m 1
... View more
09-21-2016
11:08 AM
@Pierre Villard This is working with -Dmapreduce.job.user.classpath.first=true option. Thanks a lot.
... View more
09-21-2016
07:18 AM
@Prashanth Balaiahgari It looks like security issue. If you are using kerberos , please obtain ticket using kinit command before executing sqoop command in the terminal.
... View more
09-21-2016
07:05 AM
I am trying to execute below command in sqoop sqoop import --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://***.***.***.***/DATABASE=***** --username ***** --password **** --table mytable --target-dir /user/aps/test2 --as-parquetfile -m 1 Output : -rw-r--r-- 3 ****** hdfs 0 2016-09-21 12:25 /user/aps/test2/_SUCCESS -rw-r--r-- 3 ****** hdfs 18 2016-09-21 12:25 /user/aps/test2/part-m-00000 Above output is not in parquet format. If I use com.teradata.jdbc.TeraDriver , it is working. But I have to use org.apache.sqoop.teradata.TeradataConnManager for connection. Please help.
... View more
Labels:
09-15-2016
11:12 AM
Below Sqoop commands are working for me. For Snappy: sqoop import -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://**.***.***.***/DATABASE=****** --username ****** --password **** --table mytable --target-dir /user/aps/test95 -m 1 For BZip2: sqoop import -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.BZip2Codec --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://**.***.***.***/DATABASE=****** --username ****** --password **** --table mytable --target-dir /user/aps/test96 -m 1 For lzo: sqoop import -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://**.***.***.***/DATABASE=****** --username ****** --password **** --table mytable --target-dir /user/aps/test98 -m 1
... View more
09-15-2016
10:54 AM
@Nitin Shelke This is working after adding this configuration. Thanks a lot.
... View more
09-15-2016
09:59 AM
@Nitin Shelke I have already checked with org.apache.hadoop.io.compress.SnappyCodec. This is not working for me. Sqoop command: sqoop import --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://**.***.***.***/DATABASE=****** --username ****** --password **** --table mytable --target-dir /user/aps/test85 --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec -m 1 Output: -rw-r--r-- 3 ****** hdfs 0 2016-09-15 13:39 /user/aps/test85/_SUCCESS -rw-r--r-- 3 ****** hdfs 18 2016-09-15 13:39 /user/aps/test85/part-m-00000 Please help.
... View more
09-15-2016
08:28 AM
1 Kudo
I am trying to import data from teradata to HDFS using both teradata manager and jdbc driver . Using jdbc driver it is working fine but for teradata manager it is not working as expected. I am not getting any error. Below is the sqoop commands. Using JDBC Driver: sqoop import --driver com.teradata.jdbc.TeraDriver --connect jdbc:teradata://**.***.***.***/DATABASE=****** --username ****** --password **** --table mytable --target-dir /user/aps/test87 --compress -m 1 Output: -rw-r--r-- 3 ***** hdfs 0 2016-09-15 13:45 /user/aps/test87/_SUCCESS -rw-r--r-- 3 ***** hdfs 38 2016-09-15 13:45 /user/aps/test87/part-m-00000.gz Using Teradata Manager : sqoop import --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://**.***.***.***/DATABASE=****** --username ****** --password **** --table mytable --target-dir /user/aps/test88 --compress -m 1 Output: -rw-r--r-- 3 ****** hdfs 0 2016-09-15 13:46 /user/aps/test88/_SUCCESS -rw-r--r-- 3 ****** hdfs 18 2016-09-15 13:46 /user/aps/test88/part-m-00000
For Teradata Manager output should be .gz file. Am I doing something wrong. Please help. I am facing same issue for snappy, parquet, BZip2, avro . Please help asap.
... View more
Labels:
09-15-2016
06:56 AM
@Steven O'Neill Thanks a lot 🙂 . This is working for me .
... View more
09-14-2016
03:17 PM
Below is from yarn log 2016-09-14 15:49:29,345 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter;
at org.apache.avro.mapreduce.AvroKeyRecordWriter.<init>(AvroKeyRecordWriter.java:53)
at org.apache.avro.mapreduce.AvroKeyOutputFormat$RecordWriterFactory.create(AvroKeyOutputFormat.java:78)
at org.apache.avro.mapreduce.AvroKeyOutputFormat.getRecordWriter(AvroKeyOutputFormat.java:104)
at com.teradata.connector.hdfs.HdfsAvroOutputFormat.getRecordWriter(HdfsAvroOutputFormat.java:49)
at com.teradata.connector.common.ConnectorOutputFormat$ConnectorFileRecordWriter.<init>(ConnectorOutputFormat.java:89)
at com.teradata.connector.common.ConnectorOutputFormat.getRecordWriter(ConnectorOutputFormat.java:38)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
2016-09-14 15:49:29,351 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2016-09-14 15:49:29,351 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
2016-09-14 15:49:29,352 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.
End of LogType:syslog
... View more
09-14-2016
02:23 PM
Below is the logs 16/09/14 19:51:21 INFO mapreduce.Job: Running job: job_1473861945962_0001
16/09/14 19:51:29 INFO mapreduce.Job: Job job_1473861945962_0001 running in uber mode : false
16/09/14 19:51:29 INFO mapreduce.Job: map 0% reduce 0%
16/09/14 19:51:35 INFO mapreduce.Job: map 100% reduce 0%
16/09/14 19:51:35 INFO mapreduce.Job: Job job_1473861945962_0001 completed successfully
16/09/14 19:51:35 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=167348
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=207
HDFS: Number of bytes written=18
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3441
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=3441
Total vcore-seconds taken by all map tasks=3441
Total megabyte-seconds taken by all map tasks=7047168
Map-Reduce Framework
Map input records=2
Map output records=2
Input split bytes=207
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=55
CPU time spent (ms)=2600
Physical memory (bytes) snapshot=319619072
Virtual memory (bytes) snapshot=3505516544
Total committed heap usage (bytes)=265289728
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
16/09/14 19:51:35 INFO processor.TeradataInputProcessor: input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor starts at: 1473862895748
16/09/14 19:51:36 INFO processor.TeradataInputProcessor: input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor ends at: 1473862895748
16/09/14 19:51:36 INFO processor.TeradataInputProcessor: the total elapsed time of input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor is: 0s
16/09/14 19:51:36 INFO teradata.TeradataSqoopImportHelper: Teradata import job completed with exit code 0
... View more
09-14-2016
02:19 PM
@gkeys I am running it on Ubuntu not windows. So step 2 is required anymore ?
... View more
09-14-2016
11:48 AM
@Mats Johansson I have tried with capital letters. But still facing same issue.
... View more
09-14-2016
11:35 AM
Below is the full stack trace. 16/09/14 15:49:10 INFO mapreduce.Job: Running job: job_1473774257007_0002
16/09/14 15:49:19 INFO mapreduce.Job: Job job_1473774257007_0002 running in uber mode : false
16/09/14 15:49:19 INFO mapreduce.Job: map 0% reduce 0%
16/09/14 15:49:22 INFO mapreduce.Job: Task Id : attempt_1473774257007_0002_m_000000_0, Status : FAILED
Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter;
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143 16/09/14 15:49:25 INFO mapreduce.Job: Task Id : attempt_1473774257007_0002_m_000000_1, Status : FAILED
Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter;
16/09/14 15:49:29 INFO mapreduce.Job: Task Id : attempt_1473774257007_0002_m_000000_2, Status : FAILED
Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter;
16/09/14 15:49:35 INFO mapreduce.Job: map 100% reduce 0%
16/09/14 15:49:36 INFO mapreduce.Job: Job job_1473774257007_0002 failed with state FAILED due to: Task failed task_1473774257007_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0 16/09/14 15:49:36 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=8818
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=8818
Total vcore-seconds taken by all map tasks=8818
Total megabyte-seconds taken by all map tasks=18059264
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
16/09/14 15:49:36 INFO processor.TeradataInputProcessor: input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor starts at: 1473848376584
16/09/14 15:49:37 INFO processor.TeradataInputProcessor: input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor ends at: 1473848376584
16/09/14 15:49:37 INFO processor.TeradataInputProcessor: the total elapsed time of input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor is: 0s
16/09/14 15:49:37 INFO teradata.TeradataSqoopImportHelper: Teradata import job completed with exit code 1
16/09/14 15:49:37 ERROR tool.ImportTool: Error during import: Import Job failed Schema : {
"type" : "record",
"namespace" : "avronamespace",
"name" : "Employee",
"fields" : [
{ "name" : "Id" , "type" : "string" },
{ "name" : "Name" , "type" : "string" }
]
} Also my concern is , why avro schema file is required here. I am trying to import data from Teradata to HDFS using avro file format. Please help.
... View more
09-14-2016
11:14 AM
I am trying to import data from Teradata to HDFS using lzo compression. Below is the sqoop command. sqoop import --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://**.***.***.***/DATABASE=**** --username **** --password **** --table employee --target-dir /user/aps/test3 --compress --compression-codec lzop -m 1 This command is executing perfectly but I am getting as text file format as below $ hadoop fs -ls /user/aps/test3 Found 2 items
-rw-r--r-- 3 edwweb hdfs 0 2016-09-14 16:24 /user/aps/test3/_SUCCESS
-rw-r--r-- 3 edwweb hdfs 18 2016-09-14 16:24 /user/aps/test3/part-m-00000 $ hadoop fs -cat /user/aps/test3/part-m-00000 1,Arumoy
2,Manish It should be part-m-00000.lzo instead of part-m-00000. Am I doing anything wrong?
... View more
Labels:
09-14-2016
10:43 AM
Thanks a lot . This is working for lower case.
... View more
09-14-2016
10:41 AM
@Pierre Villard Thanks a lot. This is working now with lower case.
... View more
09-14-2016
10:31 AM
@Pierre Villard I am getting below error now Error: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter I have avro-mapred-1.7.5-hadoop2.jar and avro-1.7.5.jar in my $SQOOP_HOME/lib." Please help.
... View more
- « Previous
- Next »