About Tomas79

Gomathinayagam · ‎04-01-2019

Hi @RajeshMadurai Ans 1: The exception posted is very generic. Need the complete error message that was seen on the terminal upon running MSCK to come to see what could have gone wrong. Suggestions: By default, Managed tables store their data in HDFS under the path "/user/hive/warehouse/<table_name>" or "/user/hive/warehouse/<db_name>/<table_name>". So if you have created a managed table and loaded the data into some other HDFS path manually i.e., other than "/user/hive/warehouse", the table's metadata will not get refreshed when you do a MSCK REPAIR on it. This could be one of the reasons, when you created the table as external table, the MSCK REPAIR worked as expected. Ans 2: For an unpartitioned table, all the data of the table will be stored in a single directory/folder in HDFS. For example, a table T1 in default database with no partitions will have all its data stored in the HDFS path - "/user/hive/warehouse/T1/" . Even when a MSCK is not executed, the queries against this table will work since the metadata already has the HDFS location details from where the files need to be read. On the other hand, a partitioned table will have multiple directories for each and every partition. If a new partition is added manually by creating the directory and keeping the file in HDFS, a MSCK will be needed to refresh the metadata of the table to let it know about the newly added data. Hope this helps!

Grant Henke · ‎03-29-2019

If I understand correctly, you are talking about the logs in the configured --log_dir. By default Kudu will keep 10 log files per severity level. There is a flag to change that value, but it's currently marked as "experimental". It has been in Kudu for some time, so not changing it to stable is probably a bit of an oversight. I opened an Apache Kudu jira (KUDU-2754) to change it to a stable config. In the mean time, you can use the --max_log_files configuration by unlocking experimental configurations via --unlock_experimental_flags.

bearinboulder · ‎03-29-2019

Thanks, I'll look at it. Unfortunately we have a mandate to use ansible and full automation to the largest extent possible. That's because we need to be able to set up a large variety of configurations to match what our customers use. A good model is my HDFS playbook. It 1. installs the required YUM packages 2. formats the HDFS filesystem 3. adds the standard test users 4. prepares the Kerberos keytab files (tbd) 5. prepares the SSL keystores (tbd) and sets the flags for standard mode. We can then easily turn on Kerberos and/or RPC privacy via plays that modify just a few properties and restart the services. There's an HBase playbook that sets up the HBase servers. It can use HDFS but from the conf files it looks like we could also use a traditional file and do many of our tests without also setting up a full HDFS node. That means it will require fewer resources and can run on a smaller instance or even the dev's laptop. Since it's all yum and ansible anyone can modify the image without needing to learn new tools. TPTB are fine with creating an AMI that only requires updating the crypto material but they want to be able to rebuild the AMI image from the most basic resources. Hmm, I might be able to sell this particular story as an exception. The two use cases are 1) creating new configurations that we don't have a playbook for yet and 2) verifying the configuration files for an arbitrary configuration. This won't be used in the automated tests. (tbd - I know how to do it. The blocker is reaching a consensus on the best way to manage the resources so our applications don't require tweaking the configuration everytime. Do we use a standalone KDC, an integrated solution like FreeIPA, etc.)

gzigldrum · ‎03-29-2019

To review the state please Get the affected directory path from CM > Configuration > Scope: YARN (MR2 included) > NodeManager Log Directory. The default is /var/log/hadoop-yarn Verify the available disk space for this path: # df -h /var/log/hadoop-yarn Look up what are the biggest disk space consumers on this mount point Clean up where possible If no clean up is possible and there is sufficient disk space available then you can in decrease the alarm threshold in the CM > YARN > Configuration > Scope: Node Manager > Category: Monitoring > Log Directory Free Space Monitoring Percentage Thresholds configuration property.

David_Schwab · ‎03-20-2019

Ok, I figured it out. There was a mapping rule that translated my Kerberos principal name to a lower-case short name, i.e. USER1@EXAMPLE.COM becomes user1 I had entered both USER1 and USER1@EXAMPLE.COM as HBase superusers, but not user1. Tricky. . .

eMazarakis · ‎03-15-2019

Dear @AnisurRehman You can import data from RDBMS to HDFS only with SQOOP. Then If you want to manipulate this table through Impala-Shell then you only need to run the following command from a pc where Impala is installed. impala-shell -d db_name -q "INVALIDATE METADATA tablename"; You have to do INVALIDATE because your table is new for Impala daemon metadata. Then if you append new data-files to the existing tablename table you only need to do refesh, the command is impala-shell -d db_name -q "REFRESH tablename"; Refresh due to the fact that you do not want the whole metadata for the specific table, only the block location for the new data-files. So after that you can quey the table through Impala-shell and Impala query editor.

Hadoopuser · ‎03-01-2019

Yes

hma · ‎02-19-2019

Hi Tomas, this is what the log file says: 18/Feb/2019 19:34:15 -0800] thrift_util INFO Thrift exception; retrying: TSocket read 0 bytes [18/Feb/2019 19:34:16 -0800] thrift_util INFO Thrift exception; retrying: TSocket read 0 bytes [18/Feb/2019 19:34:16 -0800] thrift_util WARNING Out of retries for thrift call: mutateRows [18/Feb/2019 19:34:16 -0800] thrift_util INFO Thrift saw a transport exception: TSocket read 0 bytes [18/Feb/2019 19:34:16 -0800] exceptions_renderable ERROR Potential trace: [('/usr/lib/hue/apps/hbase/src/hbase/api.py', 46, 'query', 'return getattr(self, action)(*args)'), ('/usr/lib/hue/apps/hbase/src/hbase/api.py', 253, 'bulkUpload', 'client.mutateRows(tableName, batches, None, doas=self.user.username)'), ('/usr/lib/hue/desktop/core/src/desktop/lib/thrift_util.py', 389, 'wrapper', 'raise StructuredThriftTransportException(e, error_code=502)')] [18/Feb/2019 19:34:16 -0800] middleware INFO Processing exception: Api Error: TSocket read 0 bytes: Traceback (most recent call last): File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/handlers/base.py", line 112, in get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/db/transaction.py", line 371, in inner return func(*args, **kwargs) File "/usr/lib/hue/apps/hbase/src/hbase/views.py", line 79, in api_router return api_dump(HbaseApi(request.user).query(*url_params)) File "/usr/lib/hue/apps/hbase/src/hbase/api.py", line 54, in query raise PopupException(_("Api Error: %s") % error_msg) PopupException: Api Error: TSocket read 0 bytes

Matrix · ‎02-19-2019

Back So far it's working fine. I also found the problem with the writing in the file. The garbage data between the interceptor and the message can contain literally anything. It contained \n which is the LF in Linux systems. This was causing the Kafka problem as well. Kafka see the \n and it assumes that the message is 2 messages, not 1, that's why when I changed the delimiter to \r\n it assumed the message to be 1 message. That's a good conclusion I guess. If you want to write in a file or apply a regex on it, then just replace \n and \r with an empty string so you don't bother with those annoying control characters. Thanks to whoever wanted to help me.

cjervis · ‎02-18-2019

Congratulations on solving your issue and thank you for sharing it for others who may run into something similar. 🙂

Online	Offline
Last Visited	‎01-14-2021 05:46 AM

Member Since	‎07-01-2015 06:03 AM
Last Visited	‎01-14-2021 05:46 AM
Posts	460
Kudos received	79

Cloudera Community

Re: Read service-wide configuration values via API

Re: Cloudera Altus - create CM with existing postg...

Re: Spark job getting failed with Jupyter notebook

Re: Create Parameterized view Impala

Re: Unable to access NameNode in cross realm trust...

Re: Hive msck repair not working managed partition...

Re: Limit Kudu logs

Re: Crash during installation/configuration using ...

Re: Node Manager Log Directory Free Space

Re: HBase Insufficient Permissions with Kerberos

Re: how to sqoop with Impala

Re: Cannot connect to Impala via ODBC

Re: TSocket reads 0 bytes in Hue

Re: Flume TAILDIR Source to Kafka Sink- Static Int...

Re: Unable to access NameNode in cross realm trust...