Member since
05-07-2018
331
Posts
45
Kudos Received
35
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7056 | 09-12-2018 10:09 PM | |
2743 | 09-10-2018 02:07 PM | |
9356 | 09-08-2018 05:47 AM | |
3089 | 09-08-2018 12:05 AM | |
4113 | 08-15-2018 10:44 PM |
07-16-2018
05:47 AM
Hi @Laeeq Ahmad. Okay, so this time you're having issues with another FQDN right? That one before was complaining about the ip-172-31-32-138.us-west-2.compute.internal and now it's the temp.tem1.org. So let's check if the nodemanager hosts (set in Ambari) matches with the cat /etc/sysconfig/network
cat /etc/hosts
hostname --fqdn Now regarding the warn/error msgs: 143 => This error afaik usually is related to memory misconfiguration, take a look at this link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_command-line-installation/content/determine-hdp-memory-config.html and also through ambari it's possible to "set the recommendation" for most of the parameters 🙂 Remote Root Log Dir [/app-logs] already exist, but with incorrect permissions. => Try to add the sticky bit to your yarn.nodemanager.remote-app-log-dir. 154 => Perhaps this link explains what's going on here https://hortonworks.com/blog/resilience-of-yarn-applications-across-nodemanager-restarts/ PS: whenever the Nodemanager crashes, check if the PID in /var/run/hadoop-yarn/yarn/ didn't get stuck. Hope this helps!
... View more
07-16-2018
02:59 AM
That's excellent news @Shane B! 😄
... View more
07-15-2018
07:49 AM
Hi Mani! This time I'll give my personal opinion, I really prefer to choose the "best fit" for each table. For example, find the best #mappers for each kinda table, if they need compression or a special where clause, transform a datatype from DB to another in Hadoop. About the performance issues, guess it's hard to have more performance using import-all than import each table separately, especially in a large number of tables like your case and even more if you're planning to run ETL over them. Like I said above, it's just my humble opinion. Also, you can take a look at the documentation, there are some rules to use import-all: http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal
Each table must have a single-column primary key. You must intend to import all columns of each table. You must not intend to use non-default splitting column, nor impose any conditions via a WHERE clause. Another good reason to take the import table as an option is the fact that you may have some troubles during the import process of all tables at once. To debug the problem it will take longer, and if you attempt to fix a possible issue on the last tables you can get bored to wait for the whole job to finish to see if the fix worked 😄 Hope this helps!
... View more
07-15-2018
07:21 AM
1 Kudo
Hello @Bal P! Hm, that's kinda strange. One thing made me curious, you have tried to create the database pointing to /home/abc/Sample and the DB got another path.. hdfs://abcdhdfs/apps/hive/warehouse/practice.db. Did you try to drop the database and recreate it again? Hope this helps!
... View more
07-15-2018
06:34 AM
Good to know, you made some progress there 🙂 Okay, is there any error code? Usually, Yarn throws a number or a few lines of classes/methods. Could you share the output from the logs? Thanks.
... View more
07-15-2018
05:33 AM
Hello @Laeeq Ahmad ! Could you check the output from the following command? netstat -tunlp |grep 8042 There's a couple of things that may help us to find the issue: - Check if you've a firewall enabled - Check your FQDN and if it matches with ip-172-31-32-138.us-west-2.compute.internal - Take a look at the logs from nodemanager: /var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-<hostname>.log try to find any ERROR/WARN Hope this helps!
... View more
07-14-2018
06:23 AM
Hello @Anji Raju! Try this out: #My XML content (don't forge to close your DOB TAG!) <FormResponse>
<Customers>
<customer_details>
<First_name>Anji</First_name>
<Last_name> Raju</Last_name>
<DOB>06/24/1278</DOB>
<Addr1> 14 duck st </Addr1>
<City> boston </City>
<State> OH </State>
<Country> USA </Country>
</customer_details>
<customer_details>
<First_name>Jeet</First_name>
<Last_name> Anu</Last_name>
<DOB>06/24/1279</DOB>
<Addr1> tuttles groove </Addr1>
<City> denver </City>
<State> CA </State>
<Country> USA </Country>
</customer_details>
<customer_details>
<First_name>Test1</First_name>
<Last_name> Test_last</Last_name>
<DOB>006/24/1280</DOB>
<Addr1> Sleek street </Addr1>
<City> cali </City>
<State> MA </State>
<Country> USA </Country>
</customer_details>
</Customers>
</FormResponse>
#DDL for the table (same as you)
CREATE EXTERNAL TABLE default.customer_details (
Customers array<struct<customer_details:struct<
First_name:string,
Last_name:string,
DOB:string,
Addr1:string,
City:string,
State:string,
Country:string>>>
)
row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
with serdeproperties
(
"column.xpath.Customers" = "/FormResponse/Customers/customer_details"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 'hdfs://Admin-TrainingNS/user/hive/warehouse/customer_details'
TBLPROPERTIES (
"xmlinput.start"="<FormResponse>",
"xmlinput.end"="</FormResponse>"
);
#Query to extract columns
select inline(Customers.customer_details) from default.customer_details;
OK
Anji Raju 06/24/1278 14 duck st boston OH USA
Jeet Anu 06/24/1279 tuttles groove denver CA USA
Test1 Test_last 006/24/1280 Sleek street cali MA USA
Time taken: 0.08 seconds, Fetched: 3 row(s) Hope this helps! 🙂
... View more
07-13-2018
08:30 PM
Good to know @Kumar Veerappan! 🙂
... View more
07-13-2018
03:15 PM
Hi @Shane B! I made the same test here using your example, and it's working 😐 [hive@node3 ~]$ cat -A test-avro.avsc
{$
"namespace": "com.linkedin.haivvreo",$
"name": "test_serializer",$
"type": "record",$
"fields": [$
{ "name":"string1", "type":"string" },$
{ "name":"int1", "type":"int" },$
{ "name":"tinyint1", "type":"int" },$
{ "name":"smallint1", "type":"int" },$
{ "name":"bigint1", "type":"long" },$
{ "name":"boolean1", "type":"boolean" },$
{ "name":"float1", "type":"float" },$
{ "name":"double1", "type":"double" },$
{ "name":"list1", "type":{"type":"array", "items":"string"} },$
{ "name":"map1", "type":{"type":"map", "values":"int"} },$
{ "name":"struct1", "type":{"type":"record", "name":"struct1_name", "fields": [$
{ "name":"sInt", "type":"int" }, { "name":"sBoolean", "type":"boolean" }, { "name":"sString", "type":"string" } ] } },$
{ "name":"union1", "type":["float", "boolean", "string"] },$
{ "name":"enum1", "type":{"type":"enum", "name":"enum1_values", "symbols":["BLUE","RED", "GREEN"]} },$
{ "name":"nullableint", "type":["int", "null"] },$
{ "name":"bytes1", "type":"bytes" },$
{ "name":"fixed1", "type":{"type":"fixed", "name":"threebytes", "size":3} }$
] }$
[hive@node3 ~]$ beeline -u 'jdbc:hive2://node3:10000/default' -n hive
Connecting to jdbc:hive2://node3:10000/default
Connected to: Apache Hive (version 1.2.1000.2.6.5.0-292)
Driver: Hive JDBC (version 1.2.1000.2.6.5.0-292)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.5.0-292 by Apache Hive
0: jdbc:hive2://node3:10000/default> CREATE TABLE test_hcc
0: jdbc:hive2://node3:10000/default> ROW FORMAT SERDE
0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
0: jdbc:hive2://node3:10000/default> STORED as AVRO
0: jdbc:hive2://node3:10000/default> TBLPROPERTIES (
0: jdbc:hive2://node3:10000/default> 'avro.schema.url'='file:///home/hive/test-avro.avsc');
No rows affected (0.513 seconds)
0: jdbc:hive2://node3:10000/default> desc formatted test_hcc;
+-------------------------------+--------------------------------------------------------------+-----------------------------------+--+
| col_name | data_type | comment |
+-------------------------------+--------------------------------------------------------------+-----------------------------------+--+
| # col_name | data_type | comment |
| | NULL | NULL |
| string1 | string | |
| int1 | int | |
| tinyint1 | int | |
| smallint1 | int | |
| bigint1 | bigint | |
| boolean1 | boolean | |
| float1 | float | |
| double1 | double | |
| list1 | array<string> | |
| map1 | map<string,int> | |
| struct1 | struct<sint:int,sboolean:boolean,sstring:string> | |
| union1 | uniontype<float,boolean,string> | |
| enum1 | string | |
| nullableint | int | |
| bytes1 | binary | |
| fixed1 | binary | |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | default | NULL |
| Owner: | hive | NULL |
| CreateTime: | Fri Jul 13 14:28:18 UTC 2018 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://Admin-TrainingNS/apps/hive/warehouse/test_hcc | NULL |
| Table Type: | MANAGED_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\"} |
| | avro.schema.url | file:///home/hive/test-avro.avsc |
| | numFiles | 0 |
| | numRows | 0 |
| | rawDataSize | 0 |
| | totalSize | 0 |
| | transient_lastDdlTime | 1531492098 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.serde2.avro.AvroSerDe | NULL |
| InputFormat: | org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | serialization.format | 1 |
+-------------------------------+--------------------------------------------------------------+-----------------------------------+--+
47 rows selected (0.48 seconds)
Sorry I'm getting curious, gonna make a lot of questions 😄 Which Hive version are you running? Are you using hiveCLI or beeline to execute these commands? Could you try to execute the following commands? E.g. Beeline 0: jdbc:hive2://node3:10000/default> !sh ls test-avro.avsc E.g. HiveCLI hive> !ls /home/hive/test-avro.avsc I made a test here to reproduce your issue (by adding a +d to my filename), got the same error hive> CREATE TABLE test_hcc2
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED as AVRO
> TBLPROPERTIES (
> 'avro.schema.url'='file:///home/hive/test-avrod.avsc');
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem: Unable to read schema from given path: file:///home/hive/test-avrod.avsc)
Also, I took a look at the Hive Github and it seems that you're hitting this line: https://github.com/apache/hive/blob/cacb1c09574c89ac07fcffc0b8c3fad18e283aec/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java#L139 BTW, I'm attaching my hive props so you can compare with yours 🙂 hive.txt Hope this helps!
... View more
07-12-2018
09:45 PM
Hello @Shane B Hmm, guess something inside your .avsc file can't be serialized by avro serde. Could you share it with us? Btw, are you able to create the table using the hdfs path?
... View more