Member since
01-26-2015
15
Posts
3
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
24443 | 03-02-2015 09:29 AM | |
13078 | 03-02-2015 08:35 AM |
05-15-2018
09:29 AM
Vilyam You were spot ON. This issue has been fixed. Excellent pair of eyes. I am getting my eyes inspected again :). Thanks, Kabeer.
... View more
05-12-2018
08:30 AM
Issue is pretty simple and I am not sure if there is a simple thing that I seem to be missing something pretty basic. CDH5.12 is being used.
1. Create Hive Table over HBase by specifying the numerical columns as binary.
2. Ingest data into the table through any means.
3. Query the table through hive and results are fine.
4. Impala gives an error.
1. Hive Table Created over HBase with following syntax:
hive -e "CREATE TABLE WeatherData(\ key STRING, \ wsid STRING, \ year INT,\ month INT,\ day INT,\ hour INT,\ temperature DOUBLE,\ dewpoint DOUBLE,\ pressure DOUBLE,\ windDirection INT,\ windSpeed DOUBLE,\ skyCondition INT,\ skyConditionText STRING, \ oneHourPrecip DOUBLE, \ sixHourPrecip DOUBLE) \ STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'\ WITH SERDEPROPERTIES (\"hbase.columns.mapping\" = \":key, weather:wsid, weather:year#b, weather:month#b, weather:day#b, weather:hour#b, weather:temperature#b, weather:dewpoint#b, weather:pressure#b, weather:windDirection#b, weather:windSpeed#b, weather:skyCondition#b, weather:skyConditionText, weather:oneHourPrecip#b, weather:sixHourPrecip#b \")\ TBLPROPERTIES (\"hbase.table.name\" = \"WeatherData\", \"hbase.mapred.output.outputtable\" = \"WeatherData\");"
2. Ingest data into the HBase table through any preferred means.
3. Results of query through Hive are as below:
hive> SELECT * FROM default.WeatherData WHERE key="002020ed-8496-4780-8ae2-fdc820d0e4e0"; Query ID = cloudera_20180512144646_baded4e1-8a5c-4fb4-bd47-8293fc910b44 Total MapReduce CPU Time Spent: 8 seconds 10 msec OK 002020ed-8496-4780-8ae2-fdc820d0e4e0 725030:14732 2008 11 17 12 3.9 -4.4 1016.9 270 5.7 0 0.0 0.0 NULL Time taken: 24.62 seconds, Fetched: 1 row(s)
4. Same query using Impala gives the below issue.
[quickstart.cloudera:21000] > SELECT * FROM default.WeatherData WHERE key="002020ed-8496-4780-8ae2-fdc820d0e4e0"; Query: select * FROM default.WeatherData WHERE key="002020ed-8496-4780-8ae2-fdc820d0e4e0" Query submitted at: 2018-05-12 14:47:47 (Coordinator: http://quickstart.cloudera:25000) ERROR: AnalysisException: Failed to load metadata for table: 'default.WeatherData' CAUSED BY: TableLoadingException: Failed to load metadata for HBase table: weatherdata CAUSED BY: SerDeException: Error: A column storage specification is one of the following: '-', a prefix of 'string', or a prefix of 'binary'. b is not a valid storage option specification for sixhourprecip
Looking into the Impala code, it seems that binary data type is not supported as column storage type. See link: https://github.com/cloudera/Impala/blob/0c713cf67959b9633d2fe6f5c21af218a43e4214/fe/src/main/java/org/apache/impala/catalog/HBaseTable.java#L265-L271
How are others dealing with this requirement?
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
-
Apache Impala
05-10-2016
09:34 AM
Kindly let me know what you mean when you mention that you too are facing similar issue. As pointed by Harsh and as I have already mentioned, I see this is a non-issue and I have moved on.
... View more
02-16-2016
03:15 PM
And if someone is wondering what is the java defunct processes (zombie processes) that I am talking about then the link of docker that goes into specifics of the issue is - https://github.com/docker/docker/issues/18502
... View more
02-16-2016
06:51 AM
Harsh, You make a fair point. I will add more information you requested for you and for the benefit of others who would hit on this thread. First of all: this is not an issue as you have quite rightly have said. My scenario: We plan to upgrade CDH from 5.4.X to 5.5.<LATEST>. So we did want to dry run in a docker to see any sticking issues. This is one of the issues that stands out. The upgrade did fail in the first time initialization towards the end. I just logged into CM home page and did start services manually - which seemed to work fine for all the processes so far except for Hive. Today sat with a colleague here to disable Canary and still found that Hive Metastore was going down. Repeated restarts from the CM appear to restart HS2 and Metastore successfully followed with an immediate shut down - throwing an error message pointing to KMS. I did add KMS too to check if it would solve the issue but even KMS addition didnt solve the issue. Upon checking further we found that HS2 and HM logs were not getting updated past yesterday midnight. So we removed those files and started the Hive HS2 and HM processes again and could finally see that HS2 and HM start without issues. We further think that the issue is due to zombie java processes that seem to have a lock effect on the log files thus somehow blocking the restart of Hive processes through CM. Suprisingly enough CM doesnt give a warning when it is actually not able to start the processes instead it ends showing a green mark. We know docker shouldnt be compared with the real time OS or per say this was our own Docker that we built for our internal testing. But we do see the same issue i.e. agents actually not doing what they claim to be doing: eg - starting Hive processes even in the latest Cloudera Quickstart docker image. I hope our experience helps someone struggling with similar issues. Thanks for your response though. Kabeer.
... View more
02-15-2016
05:15 PM
Hi, This looks like the bug: https://issues.apache.org/jira/browse/HDFS-7931, fixed only in hdfs 2.7.1 while CDH5.5.1 still uses hdfs 2.6.0. A log snapshot is below: Feb 15, 11:37:41.636 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit ugi=hue ip=172.17.0.2 cmd=source:172.17.0.2 add_partition : db=cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 tbl=CM_TEST_TABLE Feb 15, 11:37:41.717 PM INFO org.apache.hadoop.hive.common.FileUtils Creating directory if it doesn't exist: hdfs://cdh-docker:8020/user/hue/.cloudera_manager_hive_metastore_canary/hive_HIVEMETASTORE_51059e403b58f7a4f83fffcc4add47c6/cm_test_table/p1=p0/p2=420 Feb 15, 11:37:42.508 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore 19: source:172.17.0.2 add_partition : db=cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 tbl=CM_TEST_TABLE Feb 15, 11:37:42.509 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit ugi=hue ip=172.17.0.2 cmd=source:172.17.0.2 add_partition : db=cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 tbl=CM_TEST_TABLE Feb 15, 11:37:42.588 PM INFO org.apache.hadoop.hive.common.FileUtils Creating directory if it doesn't exist: hdfs://cdh-docker:8020/user/hue/.cloudera_manager_hive_metastore_canary/hive_HIVEMETASTORE_51059e403b58f7a4f83fffcc4add47c6/cm_test_table/p1=p1/p2=421 Feb 15, 11:37:44.537 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore 19: source:172.17.0.2 get_table : db=cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 tbl=CM_TEST_TABLE Feb 15, 11:37:44.551 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit ugi=hue ip=172.17.0.2 cmd=source:172.17.0.2 get_table : db=cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 tbl=CM_TEST_TABLE Feb 15, 11:37:46.384 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore 19: source:172.17.0.2 drop_table : db=cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 tbl=CM_TEST_TABLE Feb 15, 11:37:46.385 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit ugi=hue ip=172.17.0.2 cmd=source:172.17.0.2 drop_table : db=cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 tbl=CM_TEST_TABLE Feb 15, 11:37:46.420 PM ERROR org.apache.hadoop.hdfs.KeyProviderCache Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! Feb 15, 11:37:46.602 PM INFO hive.metastore.hivemetastoressimpl deleting hdfs://cdh-docker:8020/user/hue/.cloudera_manager_hive_metastore_canary/hive_HIVEMETASTORE_51059e403b58f7a4f83fffcc4add47c6/cm_test_table Feb 15, 11:37:46.609 PM INFO org.apache.hadoop.fs.TrashPolicyDefault Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes. Feb 15, 11:37:46.657 PM INFO hive.metastore.hivemetastoressimpl Moved to trash: hdfs://cdh-docker:8020/user/hue/.cloudera_manager_hive_metastore_canary/hive_HIVEMETASTORE_51059e403b58f7a4f83fffcc4add47c6/cm_test_table Feb 15, 11:37:46.770 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore 19: source:172.17.0.2 get_database: cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 Feb 15, 11:37:46.778 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit ugi=hue ip=172.17.0.2 cmd=source:172.17.0.2 get_database: cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 Feb 15, 11:37:46.805 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore 19: source:172.17.0.2 drop_database: cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 Feb 15, 11:37:46.806 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit ugi=hue ip=172.17.0.2 cmd=source:172.17.0.2 drop_database: cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 Feb 15, 11:37:46.831 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore 19: source:172.17.0.2 get_all_tables: db=cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 Feb 15, 11:37:46.831 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit ugi=hue ip=172.17.0.2 cmd=source:172.17.0.2 get_all_tables: db=cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 Feb 15, 11:37:46.851 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore 19: source:172.17.0.2 get_functions: db=cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 pat=* Feb 15, 11:37:46.852 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit ugi=hue ip=172.17.0.2 cmd=source:172.17.0.2 get_functions: db=cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 pat=* Feb 15, 11:37:46.876 PM INFO org.apache.hadoop.hive.metastore.ObjectStore Dropping database cloudera_manager_metastore_canary_test_db_hive_hivemetastore_51059e403b58f7a4f83fffcc4add47c6 along with all tables Feb 15, 11:37:46.922 PM INFO hive.metastore.hivemetastoressimpl deleting hdfs://cdh-docker:8020/user/hue/.cloudera_manager_hive_metastore_canary/hive_HIVEMETASTORE_51059e403b58f7a4f83fffcc4add47c6 Feb 15, 11:37:46.946 PM INFO org.apache.hadoop.fs.TrashPolicyDefault Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes. Feb 15, 11:37:47.018 PM INFO hive.metastore.hivemetastoressimpl Moved to trash: hdfs://cdh-docker:8020/user/hue/.cloudera_manager_hive_metastore_canary/hive_HIVEMETASTORE_51059e403b58f7a4f83fffcc4add47c6 Feb 15, 11:37:50.429 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore 19: Shutting down the object store... Feb 15, 11:37:50.430 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit ugi=hue ip=172.17.0.2 cmd=Shutting down the object store... Feb 15, 11:37:50.431 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore 19: Metastore shutdown complete. Feb 15, 11:37:50.431 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit ugi=hue ip=172.17.0.2 cmd=Metastore shutdown complete. Feb 15, 11:39:42.372 PM INFO org.apache.hadoop.hive.metastore.HiveMetaStore Shutting down hive metastore.
... View more
05-24-2015
02:04 AM
Lets do the following: 1. 4GB RAM is too less to run pseudo distributed mode from my experience. I didnt go into details but I have another friend whose laptop is also 8 GB + using Cloudera Manager and the Hue job still hangs (oozie). The same code and same setup runs well with my laptop which has about 16GB RAM. 2. On the 8GB laptop of my friend, I tried this method to verify that oozie job launched through can get executed. Go to the location: http://localhost:8888/oozie/list_oozie_workflows/ after the job has been submitted. I could see that two workflows were started when I submitted my oozie job. The first one was hanging with the same issue of heartbeat taking up all the resources and disallowing the other to start. So I would kill heartbeat workflow and immediately could see that the oozie job would schedule and complete the necessary job. Please try this. If there was cloudera manager in your setup, it would have been easy to increase the available RAM and cores for each container. So I dont know how you will handle this. I would have wanted to ideally write to you all the configuration parameters that need to be changed but busy with a few other things on my hand now. 3. And please also try to increase the vcores by a good number say add another 100 - even though this sounds ridiculous. These steps will prove that we have found the issue and then the way forward will be clear. Hope this helps Kabeer.
... View more
05-23-2015
01:48 PM
Hi, Can you please confirm your physical RAM in your machine that is hosting the pseudo distributed mode? I had similar issue when I moved from 5.2 to 5.3. All I did was that I went with the default values especially the ones in the resource manager with respect to virtual cores available and the memory available for the containers. I hope this will help. If not, please post the information sought and I will try my best to help as much as I can. Thanks, Kabeer.
... View more
03-02-2015
09:29 AM
1 Kudo
Increasing the container memory to 8GB within YARN solved the issue. In the section: Resource Manager Default Group -> Resource Management Configure: Container Memory Maximum to 8GB.
... View more
03-02-2015
08:35 AM
2 Kudos
Never mind. Issue got resolved. I thought I would post the job.properties and workflow.xml so that anyone else having this issue can refer here: workflow.xml: <workflow-app xmlns="uri:oozie:workflow:0.4" name="oozie-wf"> <start to="sqoop-wf1"/> <action name="sqoop-wf1"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <command>import --connect jdbc:mysql://localhost/nosql --table forHive --username root --password test123 --m 1 --target-dir /user/kabeer/001</command> <archive>/tmp/mysql-connector-java-5.1.33-bin.jar#mysql-connector-java-5.1.33-bin.jar</archive> </sqoop> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Failed, Error Message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app> Job.properties: nameNode=hdfs://localhost:8020 job-tracker=localhost:8032 jobTracker=localhost:8032 queueName=default weatherRoot=oozie #mapreduce.jobtracker.kerberos.principal=foo #dfs.namenode.kerberos.principal=foo oozie.libpath=${nameNode}/user/oozie/share/lib/lib_20150226170905 # oozie.wf.application.path=${nameNode}/user/${user.name}/${weatherRoot} oozie.wf.application.path=${nameNode}/user/kabeer/${weatherRoot} outputDir=sqoop-output oozie.use.system.libpath=true
... View more