About vmurakami

vmurakami · ‎07-16-2018

Hi @Laeeq Ahmad. Okay, so this time you're having issues with another FQDN right? That one before was complaining about the ip-172-31-32-138.us-west-2.compute.internal and now it's the temp.tem1.org. So let's check if the nodemanager hosts (set in Ambari) matches with the cat /etc/sysconfig/network cat /etc/hosts hostname --fqdn Now regarding the warn/error msgs: 143 => This error afaik usually is related to memory misconfiguration, take a look at this link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_command-line-installation/content/determine-hdp-memory-config.html and also through ambari it's possible to "set the recommendation" for most of the parameters 🙂 Remote Root Log Dir [/app-logs] already exist, but with incorrect permissions. => Try to add the sticky bit to your yarn.nodemanager.remote-app-log-dir. 154 => Perhaps this link explains what's going on here https://hortonworks.com/blog/resilience-of-yarn-applications-across-nodemanager-restarts/ PS: whenever the Nodemanager crashes, check if the PID in /var/run/hadoop-yarn/yarn/ didn't get stuck. Hope this helps!

vmurakami · ‎07-16-2018

That's excellent news @Shane B! 😄

vmurakami · ‎07-15-2018

Hi Mani! This time I'll give my personal opinion, I really prefer to choose the "best fit" for each table. For example, find the best #mappers for each kinda table, if they need compression or a special where clause, transform a datatype from DB to another in Hadoop. About the performance issues, guess it's hard to have more performance using import-all than import each table separately, especially in a large number of tables like your case and even more if you're planning to run ETL over them. Like I said above, it's just my humble opinion. Also, you can take a look at the documentation, there are some rules to use import-all: http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal Each table must have a single-column primary key. You must intend to import all columns of each table. You must not intend to use non-default splitting column, nor impose any conditions via a WHERE clause. Another good reason to take the import table as an option is the fact that you may have some troubles during the import process of all tables at once. To debug the problem it will take longer, and if you attempt to fix a possible issue on the last tables you can get bored to wait for the whole job to finish to see if the fix worked 😄 Hope this helps!

vmurakami · ‎07-15-2018

Hello @Bal P! Hm, that's kinda strange. One thing made me curious, you have tried to create the database pointing to /home/abc/Sample and the DB got another path.. hdfs://abcdhdfs/apps/hive/warehouse/practice.db. Did you try to drop the database and recreate it again? Hope this helps!

vmurakami · ‎07-15-2018

Good to know, you made some progress there 🙂 Okay, is there any error code? Usually, Yarn throws a number or a few lines of classes/methods. Could you share the output from the logs? Thanks.

vmurakami · ‎07-15-2018

Hello @Laeeq Ahmad ! Could you check the output from the following command? netstat -tunlp |grep 8042 There's a couple of things that may help us to find the issue: - Check if you've a firewall enabled - Check your FQDN and if it matches with ip-172-31-32-138.us-west-2.compute.internal - Take a look at the logs from nodemanager: /var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-<hostname>.log try to find any ERROR/WARN Hope this helps!

vmurakami · ‎07-14-2018

Hello @Anji Raju! Try this out: #My XML content (don't forge to close your DOB TAG!) <FormResponse> <Customers> <customer_details> <First_name>Anji</First_name> <Last_name> Raju</Last_name> <DOB>06/24/1278</DOB> <Addr1> 14 duck st </Addr1> <City> boston </City> <State> OH </State> <Country> USA </Country> </customer_details> <customer_details> <First_name>Jeet</First_name> <Last_name> Anu</Last_name> <DOB>06/24/1279</DOB> <Addr1> tuttles groove </Addr1> <City> denver </City> <State> CA </State> <Country> USA </Country> </customer_details> <customer_details> <First_name>Test1</First_name> <Last_name> Test_last</Last_name> <DOB>006/24/1280</DOB> <Addr1> Sleek street </Addr1> <City> cali </City> <State> MA </State> <Country> USA </Country> </customer_details> </Customers> </FormResponse> #DDL for the table (same as you) CREATE EXTERNAL TABLE default.customer_details ( Customers array<struct<customer_details:struct< First_name:string, Last_name:string, DOB:string, Addr1:string, City:string, State:string, Country:string>>> ) row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe' with serdeproperties ( "column.xpath.Customers" = "/FormResponse/Customers/customer_details" ) STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION 'hdfs://Admin-TrainingNS/user/hive/warehouse/customer_details' TBLPROPERTIES ( "xmlinput.start"="<FormResponse>", "xmlinput.end"="</FormResponse>" ); #Query to extract columns select inline(Customers.customer_details) from default.customer_details; OK Anji Raju 06/24/1278 14 duck st boston OH USA Jeet Anu 06/24/1279 tuttles groove denver CA USA Test1 Test_last 006/24/1280 Sleek street cali MA USA Time taken: 0.08 seconds, Fetched: 3 row(s) Hope this helps! 🙂

vmurakami · ‎07-13-2018

Good to know @Kumar Veerappan! 🙂

vmurakami · ‎07-13-2018

Hi @Shane B! I made the same test here using your example, and it's working 😐 [hive@node3 ~]$ cat -A test-avro.avsc {$ "namespace": "com.linkedin.haivvreo",$ "name": "test_serializer",$ "type": "record",$ "fields": [$ { "name":"string1", "type":"string" },$ { "name":"int1", "type":"int" },$ { "name":"tinyint1", "type":"int" },$ { "name":"smallint1", "type":"int" },$ { "name":"bigint1", "type":"long" },$ { "name":"boolean1", "type":"boolean" },$ { "name":"float1", "type":"float" },$ { "name":"double1", "type":"double" },$ { "name":"list1", "type":{"type":"array", "items":"string"} },$ { "name":"map1", "type":{"type":"map", "values":"int"} },$ { "name":"struct1", "type":{"type":"record", "name":"struct1_name", "fields": [$ { "name":"sInt", "type":"int" }, { "name":"sBoolean", "type":"boolean" }, { "name":"sString", "type":"string" } ] } },$ { "name":"union1", "type":["float", "boolean", "string"] },$ { "name":"enum1", "type":{"type":"enum", "name":"enum1_values", "symbols":["BLUE","RED", "GREEN"]} },$ { "name":"nullableint", "type":["int", "null"] },$ { "name":"bytes1", "type":"bytes" },$ { "name":"fixed1", "type":{"type":"fixed", "name":"threebytes", "size":3} }$ ] }$ [hive@node3 ~]$ beeline -u 'jdbc:hive2://node3:10000/default' -n hive Connecting to jdbc:hive2://node3:10000/default Connected to: Apache Hive (version 1.2.1000.2.6.5.0-292) Driver: Hive JDBC (version 1.2.1000.2.6.5.0-292) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1000.2.6.5.0-292 by Apache Hive 0: jdbc:hive2://node3:10000/default> CREATE TABLE test_hcc 0: jdbc:hive2://node3:10000/default> ROW FORMAT SERDE 0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 0: jdbc:hive2://node3:10000/default> STORED as AVRO 0: jdbc:hive2://node3:10000/default> TBLPROPERTIES ( 0: jdbc:hive2://node3:10000/default> 'avro.schema.url'='file:///home/hive/test-avro.avsc'); No rows affected (0.513 seconds) 0: jdbc:hive2://node3:10000/default> desc formatted test_hcc; +-------------------------------+--------------------------------------------------------------+-----------------------------------+--+ | col_name | data_type | comment | +-------------------------------+--------------------------------------------------------------+-----------------------------------+--+ | # col_name | data_type | comment | | | NULL | NULL | | string1 | string | | | int1 | int | | | tinyint1 | int | | | smallint1 | int | | | bigint1 | bigint | | | boolean1 | boolean | | | float1 | float | | | double1 | double | | | list1 | array<string> | | | map1 | map<string,int> | | | struct1 | struct<sint:int,sboolean:boolean,sstring:string> | | | union1 | uniontype<float,boolean,string> | | | enum1 | string | | | nullableint | int | | | bytes1 | binary | | | fixed1 | binary | | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | default | NULL | | Owner: | hive | NULL | | CreateTime: | Fri Jul 13 14:28:18 UTC 2018 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Protect Mode: | None | NULL | | Retention: | 0 | NULL | | Location: | hdfs://Admin-TrainingNS/apps/hive/warehouse/test_hcc | NULL | | Table Type: | MANAGED_TABLE | NULL | | Table Parameters: | NULL | NULL | | | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\"} | | | avro.schema.url | file:///home/hive/test-avro.avsc | | | numFiles | 0 | | | numRows | 0 | | | rawDataSize | 0 | | | totalSize | 0 | | | transient_lastDdlTime | 1531492098 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.avro.AvroSerDe | NULL | | InputFormat: | org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets: | -1 | NULL | | Bucket Columns: | [] | NULL | | Sort Columns: | [] | NULL | | Storage Desc Params: | NULL | NULL | | | serialization.format | 1 | +-------------------------------+--------------------------------------------------------------+-----------------------------------+--+ 47 rows selected (0.48 seconds) Sorry I'm getting curious, gonna make a lot of questions 😄 Which Hive version are you running? Are you using hiveCLI or beeline to execute these commands? Could you try to execute the following commands? E.g. Beeline 0: jdbc:hive2://node3:10000/default> !sh ls test-avro.avsc E.g. HiveCLI hive> !ls /home/hive/test-avro.avsc I made a test here to reproduce your issue (by adding a +d to my filename), got the same error hive> CREATE TABLE test_hcc2 > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED as AVRO > TBLPROPERTIES ( > 'avro.schema.url'='file:///home/hive/test-avrod.avsc'); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem: Unable to read schema from given path: file:///home/hive/test-avrod.avsc) Also, I took a look at the Hive Github and it seems that you're hitting this line: https://github.com/apache/hive/blob/cacb1c09574c89ac07fcffc0b8c3fad18e283aec/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java#L139 BTW, I'm attaching my hive props so you can compare with yours 🙂 hive.txt Hope this helps!

vmurakami · ‎07-12-2018

Hello @Shane B Hmm, guess something inside your .avsc file can't be serialized by avro serde. Could you share it with us? Btw, are you able to create the table using the hdfs path?

Online	Offline
Last Visited	‎12-23-2018 04:33 AM

Member Since	‎05-07-2018 06:05 PM
Last Visited	‎12-23-2018 04:33 AM
Posts	331
Kudos received	45

Cloudera Community

Re: Minifi not connecting to Nifi - remote instanc...

Re: getsnmp attribute

Re: XML and Hive parsing error with Serde.

Re: Ranger and HDFS over SSL

Re: livy2 zepplin issue

Re: Ambari Show NodeManager Down, However ...

Re: Hive table pointing to Avro schema on local fi...

Re: What is maximum table limit for sqoop import

Re: Describe Database Extended not displaying addi...

Re: Ambari Show NodeManager Down, However ...

Re: Ambari Show NodeManager Down, However ...

Re: load array of values into HIVE

Re: Importing data from MS SQL Server through SCOO...

Re: Hive table pointing to Avro schema on local fi...

Re: Hive table pointing to Avro schema on local fi...