About Tomas79

Tomas79 · ‎08-09-2016

Hi, After removing Hbase from a CDH cluster na deleting /hbase directory in HDFS and Zookeeper (what was a pain due to ACL) I created a new Hbase, added Hbase to the cluster via wizard. Everything worked fine, but after 2 minutes, I got a wierd error, the Cloudera Manager reports that the Hbase Master Health is bad - (Availability: Unknown, Health: Good). This health test is bad because the Service Monitor did not find an active Master. But in the other hand, the Hbase Master shows green in instances, and reports "1 Good Health" in Status Summary. I checked the Hbase Master page, there is no issue, nothing in logs. I tried to run hbase hbck - everything is fine. I tried to create table - it was created without any issue, Any suggestions what can cause the BAD health? Thanks

Tomas79 · ‎07-29-2016

Any update on this? Have you created the JIRA? thanks

Tomas79 · ‎07-27-2016

In Hive the solution is: select id, ix, locations.key, locations.val from ( select id, ix, locs from mytable lateral view posexplode( locations ) x as ix,locs ) tab lateral view explode( locs ) locations as key,val; which returns the correct result set:

Tomas79 · ‎07-27-2016

Hi, thanks for the response, yes it seems to be a bug. You can test it yourself, here are the scripts to reproduce: (in Hive, "i" is just a dummy table): create table mytable( id int, locations array< map<string,string>>) stored as parquet; insert into table mytable select 10001, array( map("Location1.City","Bratislava","Location1.Country","SK","Location1.LAT","41"), map("Location2.City","Kosice","Location2.Country","SK","Location2.LAT","42") ) from i limit 1; insert into table mytable select 10002, array( map("Location1.City","Wien","Location1.Country","AT","Location1.LAT","40"), map("Location2.City","Graz","Location2.Country","AT","Location2.LAT","40") ) from i limit 1; Query results in Impala: These results are wrond, what I need is a location_number - 0 for Location1.City 0 for Location1.Country 0 for Location1.LAT 1 for Location2.City 1 for Location2.Country 1 for Location2.LAT ... and so on for every ID in the table. What you suggested (adding l.item), returned a wrong result set, because City in the result set is twice, but LAT is missing - this seems to be definitely a BUG:

Tomas79 · ‎07-26-2016

Hi, how can I access the position of the element (in my case of the MAP) in the array column? TABLE DDL: CREATE TABLE mytable ( id int, location ARRAY< MAP< string, string> > ) When I query the table with this SQL I got a position of the KEY-VALUE pair INSIDE the MAP, but what I really want to access is the position in the location ARRAY, e.g. 1st location, 2nd location and so on. select c.id, a.pos, a.key, a.value from mytable c left join c.locations a order by c.id, a.pos; Thanks

Tomas79 · ‎07-13-2016

In the meantime I figured out one possible solution, which seems to be stable and not running out of memory. The hivecontext has to be created outside in a singleton object.

Tomas79 · ‎06-23-2016

Hello, I tried to make a simple application in Spark Streaming which reads every 5s new data from HDFS and simply inserts into a Hive table. On the official Spark web site I have found an example, how to perform SQL operations on DStream data, via foreachRDD function, but the catch is, that the example used sqlContext and transformed the data from RDD to DataFrame. The problem is, that with this DF, the data cannot be saved (appended) to an existing permanent Hive table. HiveContext has to be created. So I tried this program, it works, but fails after a while, because runs out of memory, because it creates every time a new HiveContext object. I tried to create the HiveContext BEFORE the map, and broadcast it, but it failed. I tried to call getOrCreate, which works fine with sqlContext but not with hiveContext. Any ideas? Thanks Tomas val sparkConf = new SparkConf().setAppName("StreamHDFSdata") sparkConf.set("spark.dynamicAllocation.enabled","false") val ssc = new StreamingContext(sparkConf, Seconds(5)) ssc.checkpoint("/user/hdpuser/checkpoint") val sc = ssc.sparkContext val smDStream = ssc.textFileStream("/user/hdpuser/data") val smSplitted = smDStream.map( x => x.split(";") ).map( x => Row.fromSeq( x ) ) val smStruct = StructType( (0 to 10).toList.map( x => "col"+x.toString).map( y => StructField( y , StringType, true ) ) ) //val hiveCx = new org.apache.spark.sql.hive.HiveContext(sc) //val sqlBc = sc.broadcast( hiveCx ) smSplitted.foreachRDD( rdd => { //val sqlContext = SQLContext.getOrCreate(rdd.sparkContext) --> sqlContext cannot be used for permanent table create val sqlContext = new org.apache.spark.sql.hive.HiveContext(rdd.sparkContext) //val sqlContext = sqlBc.value --> THIS DOES NOT WORK: fail during runtime //val sqlContext = new HiveContext.getOrCreate(rdd.sparkContext) --> THIS DOES NOT WORK EITHER: fail during runtime //import hiveCx.implicits._ val smDF = sqlContext.createDataFrame( rdd, smStruct ) //val smDF = rdd.toDF smDF.registerTempTable("sm") val smTrgPart = sqlContext.sql("insert into table onlinetblsm select * from sm") smTrgPart.write.mode(SaveMode.Append).saveAsTable("onlinetblsm") } )

Tomas79 · ‎06-10-2016

Actually this has been already resolved, we changed the create table statetment, added #b (hash b - as binary). create external table md_extract_file_status ( table_key string, fl_counter bigint ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,colfam:FL_Counter#b ) TBLPROPERTIES('hbase.table.name' ='HBTABLE');

Tomas79 · ‎05-24-2016

Hbase has a nice feature called counter increment, where you can atomicly increment a value and get back the result. I want to create ta simple external table over this hbase table, but I dont know how to choose the correct data type for Hive/Impala. The value of colfam:FL_Lock is this: 20160512_000006 column=colfam:FL_Lock, timestamp=1464120550634, value=\x00\x00\x00\x00\x00\x00\x00\x00 If I create external table with string, the query returns nothing, no error If I create external table with bigint/decimal/int, the query returns NULL and ERROR from Impala: Error converting column colfam:FL_Lock: '' TO INT Any ideas how to map correctly this Hbase column? Thanks Tomas

Tomas79 · ‎05-07-2016

Hi, is it a bug, or a desired feature that the create external table (or change location of external table) is allowed only for serveradmin roles? Based on the documentation the database ALL permission should be sufficient, but there is a statement that also the URI should be accessible. However when I change my test user permission and remove serveradmin, he cannot create an external table pointing to his home directory such like this: create table part ( i int, s string ) stored as textfile location '/user/testuser/part'; ERROR: AuthorizationException: User 'testuser@MYREALM.LOCAL' does not have privileges to access: hdfs://hdfscluster/user/testuser/part After enabling the serveradmin right for testuser the command executes correctly and the table is created. Any hints on this? Thanks

Online	Offline
Last Visited	‎01-14-2021 05:46 AM

Member Since	‎07-01-2015 06:03 AM
Last Visited	‎01-14-2021 05:46 AM
Posts	460
Kudos received	79

Cloudera Community

Re: Read service-wide configuration values via API

Re: Cloudera Altus - create CM with existing postg...

Re: Spark job getting failed with Jupyter notebook

Re: Create Parameterized view Impala

Re: Unable to access NameNode in cross realm trust...

HBase Master Health - bad

Re: Impala Complex types: position in ARRAY of MAP

Re: Impala Complex types: position in ARRAY of MAP

Re: Impala Complex types: position in ARRAY of MAP

Impala Complex types: position in ARRAY of MAP

Re: How to write data from dStream into permanent ...

How to write data from dStream into permanent Hive...

Re: Access Hbase Increment column via hive/impala

Access Hbase Increment column via hive/impala

How to create external table without serveradmin r...