Member since
08-08-2016
17
Posts
1
Kudos Received
0
Solutions
01-29-2019
03:28 PM
I submitted the pig script by passing parameters and I got below end of file exception in tez. What can be solution of it? pig -param "DATE=$DATE" -param "LOCATION=$LOCATION" -param "TABLE_LOCATION=$TABLE_LOCATION" -x tez -useHCatalog -f /lake/T/S/r11.pig [PigTezLauncher-0] ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Cannot submit DAG - Application id: application_1472313221699_4775 End of File Exception between local host is: "hostlaenapp02.app.test.foocorp.net/172.16.13.75"; destination host is: "hostldnapp06.app.test.foocorp.net":41450; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Pig
-
Apache Tez
02-21-2018
10:11 PM
I found the correct answer like make dataframe in pyspark left join with itself and filter nulls and not nulls, put everything into a loop until df.take(1)=[] null. --------------------------------------------------------------------------- df = df_tr.select(col("old_id"),col("new__id")).distinct() df2 = df df_tr = spark.createDataFrame([('null', 'null')], while (df.take(1) != []): df = df.alias("df1").join(df2.alias("df2"), col('df1.new__id') == col('df2.old_id'),
'left_outer') df_null = df.filter(col('df2.new_id').isNull()).select(col('df1.old_id'),
col('df1.new_id')) df = df.filter(col('df2.new_id').isNotNull()).select(col('df1.old_id'),
col('df2.new_id')) df_tr = df_null.union(df_tracking)
... View more
07-14-2017
02:15 AM
I have a table id_track history, which is updating id in different time-stamp. I want to consolidate into latest id by iterative search in sql. How can I do it in hive or pig?
Table:
OLD_ID NEW_ID TIME-STAMP
101 103 1/5/2001
102 108 2/5/2001
103 105 3/5/2001
105 106 4/5/2001
110 111 4/5/2001
108 116 14/5/2001
112 117 4/6/2001
104 118 4/7/2001
111 119 4/8/2001
Desired Resulting table:
OLD_ID LATEST_ID LAST TIME-STAMP
101 106 4/5/2001
102 116 14/5/2001
104 118 4/7/2001
110 111 4/5/2001
112 117 4/6/2001
111 119 4/8/2001
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Pig
-
Apache Spark
10-20-2016
02:41 PM
Thank you Emily for your reply.
... View more
10-14-2016
04:31 PM
@mqureshi I still failed to export blanks as blanks.
... View more
10-14-2016
04:30 PM
data:
a||c
|b|c
a|b|
a|\N\c
a|N|c
a|\\N|c
a|NULL|c
Table:
CREATE SET TABLE panda.test_tera2 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
col VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
col2 CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
col3 VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( col );
---------------------------
Command:
sqoop export --connect jdbc:teradata://xxx/database=xx --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username --password xx --export-dir /test/xxtemp/teradata2 --table test_tera2 --input-fields-terminated-by '|' -m 2
-------------------------
Result:
col col2 col3
1 a ? c
2 ? b c
3 a \N\c ?
4 a N c
5 a b ?
6 a NULL c
7 a \\N c
------------------------------------------------
Command:
sqoop export --connect jdbc:teradata://xx/database=xx --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username xx --password xx --export-dir /test/xx/teradata2 --table test_tera2 --input-fields-terminated-by '|' -m 2 --input-null-non-string "\\N"
RESULT:
col col2 col3
1 a ? c
2 ? b c
3 a \N\c ?
4 a N c
5 a b ?
6 a NULL c
7 a \\N c
------------------------------------------------
Command:
sqoop export --connect jdbc:teradata://xx/database=xx --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username xx --password xx --export-dir /test/xx/teradata2 --table test_tera2 --input-fields-terminated-by '|' -m 2 --input-null-non-string "\\N" --input-null-string "\\N"
RESULT:
col col2 col3
1 a ? c
2 ? b c
3 a \N\c ?
4 a N c
5 a b ?
6 a NULL c
7 a \\N c
--------------------------------------------------
I am still failed to export blanks || field between pipes as blanks in teradata with sqoop export.
Could someone explain me the reason and way to solve it?
... View more
10-14-2016
03:35 AM
Thank you, I already tried it. But It converts blanks into nulls. I want blanks as blanks for pig output stored in hdfs like a,,b,,
... View more
10-14-2016
01:02 AM
1 Kudo
I have pig output like below in hdfs a,,b,, I created target table as, CREATE SET TABLE panda.test ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
col VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
col2 VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
col3 VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC, col4 VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( col ); And when I export it sqoop export --connect jdbc:teradata://xxxxx/database=xx --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username xx --password xx --export-dir /test/xxx/xx/temp/teradata1 --table test --input-fields-terminated-by ',' -m 2 --input-null-string "\\N" --input-null-non-string "\\N" or sqoop export --connect jdbc:teradata://xxxxx/database=xx --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username xx --password xx --export-dir /test/xxx/xx/temp/teradata1 --table test --input-fields-terminated-by ',' -m 2 or sqoop export --connect jdbc:teradata://xxxxx/database=xx --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username xx --password xx --export-dir /test/xxx/xx/temp/teradata1 --table test --input-fields-terminated-by ',' -m 2 --input-null-string "\\" --input-null-non-string "\\" I get null values in place of blanks in teradata table. Any one has suggestion so that I can export blank as blank in teradata.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Pig
-
Apache Sqoop
08-09-2016
03:16 AM
Thank you @Lester Martin. Your blog is wonderful. I am checking my dataset. , Hi Martin, Thank you for such a wonderful blog.
... View more
08-08-2016
01:24 AM
a = LOAD '601' using org.apache.hive.hcatalog.pig.HCatLoader(); b = LOAD '602' using org.apache.hive.hcatalog.pig.HCatLoader(); c = LOAD '603' using org.apache.hive.hcatalog.pig.HCatLoader(); d = LOAD 'SKL' using org.apache.hive.hcatalog.pig.HCatLoader(); e = join a by (d_key, c_cd ), b by (d_key, c_cd), c by (p1_key, c_cd), d by (p2_key, c_cd); Dump e; ======================================================================== If I do the same joins in Hive, I get output. But In Pig, On dumping e, It runs MapReduce, read rows but doesn't write output after success. But If I do the same thing in hive by nested inner join, I get the correct output. can anyone explain me the bug in PIG In joining different keys in relations? Another thing if I want to load e with HCatlogue(HCatStore) into a partitioned blank table with dynamic partition (without value). I am getting partitioned table error. I don't know the reason of the error in Hcatalogue. Please explain me if you faced the same thing and provide me any solution.
... View more
Labels:
- Labels:
-
Apache HCatalog
-
Apache Hive
-
Apache Pig