Member since
07-01-2015
460
Posts
78
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1359 | 11-26-2019 11:47 PM | |
1309 | 11-25-2019 11:44 AM | |
9516 | 08-07-2019 12:48 AM | |
2193 | 04-17-2019 03:09 AM | |
3519 | 02-18-2019 12:23 AM |
08-09-2016
06:06 AM
Hi, After removing Hbase from a CDH cluster na deleting /hbase directory in HDFS and Zookeeper (what was a pain due to ACL) I created a new Hbase, added Hbase to the cluster via wizard. Everything worked fine, but after 2 minutes, I got a wierd error, the Cloudera Manager reports that the Hbase Master Health is bad - (Availability: Unknown, Health: Good). This health test is bad because the Service Monitor did not find an active Master. But in the other hand, the Hbase Master shows green in instances, and reports "1 Good Health" in Status Summary. I checked the Hbase Master page, there is no issue, nothing in logs. I tried to run hbase hbck - everything is fine. I tried to create table - it was created without any issue, Any suggestions what can cause the BAD health? Thanks
... View more
07-29-2016
02:19 AM
Any update on this? Have you created the JIRA? thanks
... View more
07-27-2016
01:34 AM
In Hive the solution is: select id, ix, locations.key, locations.val from ( select id, ix, locs from mytable lateral view posexplode( locations ) x as ix,locs ) tab lateral view explode( locs ) locations as key,val; which returns the correct result set:
... View more
07-27-2016
12:43 AM
Hi, thanks for the response, yes it seems to be a bug. You can test it yourself, here are the scripts to reproduce: (in Hive, "i" is just a dummy table): create table mytable( id int, locations array< map<string,string>>) stored as parquet;
insert into table mytable select 10001, array( map("Location1.City","Bratislava","Location1.Country","SK","Location1.LAT","41"), map("Location2.City","Kosice","Location2.Country","SK","Location2.LAT","42") ) from i limit 1;
insert into table mytable select 10002, array( map("Location1.City","Wien","Location1.Country","AT","Location1.LAT","40"), map("Location2.City","Graz","Location2.Country","AT","Location2.LAT","40") ) from i limit 1; Query results in Impala: These results are wrond, what I need is a location_number - 0 for Location1.City 0 for Location1.Country 0 for Location1.LAT 1 for Location2.City 1 for Location2.Country 1 for Location2.LAT ... and so on for every ID in the table. What you suggested (adding l.item), returned a wrong result set, because City in the result set is twice, but LAT is missing - this seems to be definitely a BUG:
... View more
07-26-2016
09:10 AM
Hi, how can I access the position of the element (in my case of the MAP) in the array column? TABLE DDL: CREATE TABLE mytable ( id int, location ARRAY< MAP< string, string> > ) When I query the table with this SQL I got a position of the KEY-VALUE pair INSIDE the MAP, but what I really want to access is the position in the location ARRAY, e.g. 1st location, 2nd location and so on. select c.id, a.pos, a.key, a.value from mytable c left join c.locations a order by c.id, a.pos; Thanks
... View more
Labels:
- Labels:
-
Apache Impala
07-13-2016
08:01 AM
In the meantime I figured out one possible solution, which seems to be stable and not running out of memory. The hivecontext has to be created outside in a singleton object.
... View more
06-23-2016
07:55 AM
2 Kudos
Hello, I tried to make a simple application in Spark Streaming which reads every 5s new data from HDFS and simply inserts into a Hive table. On the official Spark web site I have found an example, how to perform SQL operations on DStream data, via foreachRDD function, but the catch is, that the example used sqlContext and transformed the data from RDD to DataFrame. The problem is, that with this DF, the data cannot be saved (appended) to an existing permanent Hive table. HiveContext has to be created. So I tried this program, it works, but fails after a while, because runs out of memory, because it creates every time a new HiveContext object. I tried to create the HiveContext BEFORE the map, and broadcast it, but it failed. I tried to call getOrCreate, which works fine with sqlContext but not with hiveContext. Any ideas? Thanks Tomas val sparkConf = new SparkConf().setAppName("StreamHDFSdata") sparkConf.set("spark.dynamicAllocation.enabled","false") val ssc = new StreamingContext(sparkConf, Seconds(5)) ssc.checkpoint("/user/hdpuser/checkpoint") val sc = ssc.sparkContext val smDStream = ssc.textFileStream("/user/hdpuser/data") val smSplitted = smDStream.map( x => x.split(";") ).map( x => Row.fromSeq( x ) ) val smStruct = StructType( (0 to 10).toList.map( x => "col"+x.toString).map( y => StructField( y , StringType, true ) ) ) //val hiveCx = new org.apache.spark.sql.hive.HiveContext(sc) //val sqlBc = sc.broadcast( hiveCx ) smSplitted.foreachRDD( rdd => { //val sqlContext = SQLContext.getOrCreate(rdd.sparkContext) --> sqlContext cannot be used for permanent table create val sqlContext = new org.apache.spark.sql.hive.HiveContext(rdd.sparkContext) //val sqlContext = sqlBc.value --> THIS DOES NOT WORK: fail during runtime //val sqlContext = new HiveContext.getOrCreate(rdd.sparkContext) --> THIS DOES NOT WORK EITHER: fail during runtime //import hiveCx.implicits._ val smDF = sqlContext.createDataFrame( rdd, smStruct ) //val smDF = rdd.toDF smDF.registerTempTable("sm") val smTrgPart = sqlContext.sql("insert into table onlinetblsm select * from sm") smTrgPart.write.mode(SaveMode.Append).saveAsTable("onlinetblsm") } )
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
-
HDFS
06-10-2016
05:44 AM
Actually this has been already resolved, we changed the create table statetment, added #b (hash b - as binary). create external table md_extract_file_status ( table_key string, fl_counter bigint ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,colfam:FL_Counter#b ) TBLPROPERTIES('hbase.table.name' ='HBTABLE');
... View more
05-24-2016
01:26 PM
Hbase has a nice feature called counter increment, where you can atomicly increment a value and get back the result. I want to create ta simple external table over this hbase table, but I dont know how to choose the correct data type for Hive/Impala. The value of colfam:FL_Lock is this: 20160512_000006 column=colfam:FL_Lock, timestamp=1464120550634, value=\x00\x00\x00\x00\x00\x00\x00\x00 If I create external table with string, the query returns nothing, no error If I create external table with bigint/decimal/int, the query returns NULL and ERROR from Impala: Error converting column colfam:FL_Lock: '' TO INT Any ideas how to map correctly this Hbase column? Thanks Tomas
... View more
Labels:
- Labels:
-
Apache HBase
05-07-2016
01:07 PM
Hi, is it a bug, or a desired feature that the create external table (or change location of external table) is allowed only for serveradmin roles? Based on the documentation the database ALL permission should be sufficient, but there is a statement that also the URI should be accessible. However when I change my test user permission and remove serveradmin, he cannot create an external table pointing to his home directory such like this: create table part ( i int, s string ) stored as textfile location '/user/testuser/part'; ERROR: AuthorizationException: User 'testuser@MYREALM.LOCAL' does not have privileges to access: hdfs://hdfscluster/user/testuser/part After enabling the serveradmin right for testuser the command executes correctly and the table is created. Any hints on this? Thanks
... View more
Labels:
- Labels:
-
HDFS