About Tomas79

Tomas79 · ‎08-26-2018

What OS are you using?

Tomas79 · ‎08-26-2018

Hi, you dont have to union 60 times, you can do this: select t.rowid, t.orderdate, t.shipmode, t.customername, t.state, m.metric, case m.metric when 'sales' then t.sales when 'quantity' then t.quantity end value from mytable t cross join ( select 'sales' metric union all select 'quantity' metric ) m

Tomas79 · ‎08-24-2018

Hi, you can inspect the avro files with avro-tools utility. create table work.test_avro ( i int, s string ) stored as avro; insert into work.test_avro select 1, "abc"; set hive.exec.compress.output = true set hive.exec.compress.intermediate = true; set avro.output.codec= snappy; insert into work.test_avro select 2, "abcdefgb"; In this table there are two file, one compressed with snappy one without compression, you can check it with get-meta command: $ avro-tools getmeta 000000_0 avro.schema {"type":"record","name":"test_avro","namespace":"work","fields":[{"name":"i","type":["null","int"],"default":null},{"name":"s","type":["null","string"],"default":null}]} $ avro-tools getmeta 000000_0_copy_1 avro.schema {"type":"record","name":"test_avro","namespace":"work","fields":[{"name":"i","type":["null","int"],"default":null},{"name":"s","type":["null","string"],"default":null}]} avro.codec snappy

Tomas79 · ‎08-24-2018

Hi, if you are using Cloudera Manager deployed cluster with parcels, add a new host to the list of host and then deploy YARN and SPARK GATEWAY roles on this node. This will trigger the CM and it will distribute the parcels on this edge node and "activate" it. After that you should have on PATH the following commands: spark-submit, spark-shell (or spark2-submit, spark2-shell if you deployed SPARK2_ON_YARN) If you are using Kerberos, make sure you have the client libraries and valid krb5.conf file. And make sure you have a valid ticket in your cache. Then to submit a spark job to YARN: spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] <app jar> [app options] or spark-submit --class path.to.your.Class --master yarn --deploy-mode client [options] <app jar> [app options]

Tomas79 · ‎08-24-2018

As @GeKas said, enable ACL on HDFS (I you dont have it already - dfs.namenode.acls.enabled should be checked). Then you need to set the default group access for the parent directory, so every other subdirectory will be accessible by the mapred user (assuming mapred is in hadoop group): hdfs dfs -setfacl -R -m default:group:hadoop:r-x /user/historyh hdfs dfs -setfacl -R -m group:hadoop:r-x /user/history And try it again.

Tomas79 · ‎08-23-2018

You are probably hiting OOM, maybe overloaded system. Do you have any warnings about overcommitment (how much memory the node has for OS, YARN, Impala etc)?

Tomas79 · ‎08-23-2018

Hi, I think it is related to the snapshots or hidden directories. Maybe the distcp is preparing a snapshot, and as it failed, it left these temporary objects in HDFS.

Tomas79 · ‎08-23-2018

I understand. But how many times you create and drop ta table with 180k partitions? It is a matter of simple script. But maybe you are right, the metastore should handle bigger timeouts.

Tomas79 · ‎08-23-2018

I assume the type mismatch is because you dont have defined the else in the if statement. Then the result is just Any. Type mismatch, expected: Seq[String], actual: Array[Any] val regExpr = yearDF.schema.fields.map(x => if(x.dataType == String) { your_regex(x) } else { some_expression_returning_string } ) yearDF.selectExpr(regExpr:_*)

Tomas79 · ‎08-23-2018

Hi, try to log into the Metastore database and manually remove the table from the metadata tables. I think the table containing tables is TBLS. You should also remove the records from child tables, such as columns and locations. Then restart the metastore, and it should ok. As this is an external table, you will not remove the data with this action.

Online	Offline
Last Visited	‎01-14-2021 05:46 AM

Member Since	‎07-01-2015 06:03 AM
Last Visited	‎01-14-2021 05:46 AM
Posts	460
Kudos received	79

Cloudera Community

Re: Read service-wide configuration values via API

Re: Cloudera Altus - create CM with existing postg...

Re: Spark job getting failed with Jupyter notebook

Re: Create Parameterized view Impala

Re: Unable to access NameNode in cross realm trust...

Re: Node manager & Resource manager unexpected exi...

Re: Convert columns to rows

Re: Verify the file's compression properties

Re: Submit spark job from outside cluster

Re: HDFS or HIVE Replication

Re: Yarn- Node Manager unexpected exists occurring...

Re: hadoop -count returning wrong result

Re: Hive drop partitions using range impacts metas...

Re: How to apply Regex pattern on a Dataframe's St...

Re: Hive drop partitions using range impacts metas...