About jstraub

jstraub · ‎03-03-2016

Usually when you want to use curl in combination with Kerberos (secured cluster), you have to use the following command: curl --negotiate -u : -X GET 'http://localhost:50111/templeton/v1/hive?user.name=ekoifman' Make sure you have a valid kerberos ticket (run: klist)

jstraub · ‎03-03-2016

What Ambari version are you using, is it the latest one (2.2.1.0)? You might have to upgrade to the latest version, so that you have the latest stack information

jstraub · ‎03-01-2016

Important! Only format the Namenode if you do not have any data in your cluster!

jstraub · ‎02-25-2016

Great content, thanks for sharing!

jstraub · ‎02-25-2016

Hi @prakash pal there are some differences between these data types, basically string allows a variable length of characters (max 32K chars), char is a fixed length string (max. 255 chars). Usually (I doubt that this is different with Impala) CHAR is more efficient and can speed up operations and is better reg. memory allocation. (This does not mean always use CHAR) See this => "All data in CHAR and VARCHAR columns must be in a character encoding that is compatible with UTF-8. If you have binary data from another database system (that is, a BLOB type), use a STRING column to hold it." There are a lot of use cases where it makes sense to only use CHAR instead of STRING, e.g. lets say you want to have a column that stores the two-letter country code (ISO_3166-1_alpha-2; e.g. US, ES, UK,...), here it makes more sense to use CHAR.

jstraub · ‎02-25-2016

This might be a Parquet problem, but could also be something else. I have seen some performance and job issues when using Parquet instead of ORC. Have you seen this https://issues.apache.org/jira/browse/HDFS-8475 What features are you missing regarding SparkORC? I have seen you error before, but in a different context (Query on ORC table was failing) Make sure your HDFS (especially the DNs) are running and healthy. It might be related to some bad blocks, so make sure the blocks that are related to your job are ok

jstraub · ‎02-22-2016

You should be able to see the query in the HiveServer log or a Hive-related UI, like Hive View for Ambari or Hue (there should be a query history). The Resourcemanager does not show the full query, because the job is only named after a partial of the query. Why only a partial? Some queries can be quite large and Job Name is limited in regards to the allowed #characters.

jstraub · ‎02-22-2016

Take a look at this questions, maybe it is helpful => https://community.hortonworks.com/questions/3012/what-are-the-steps-an-operator-should-take-to-repl.html

jstraub · ‎02-18-2016

You can specify the number of mappers that will be used for the distcp job. -m <num_maps> Maximum number of simultaneous copies Specify the number of maps to copy data. Note that more maps may not necessarily improve throughput. If nothing is specified, the default should be 20 map tasks. /* Default number of maps to use for DistCp */ public static final int DEFAULT_MAPS = 20;

jstraub · ‎02-18-2016

This sounds like the issue mentioned here https://github.com/cloudera/hue/issues/304, however I dont know a valid workaround for our Hue version at the moment. I strongly encourage you to use different ways to ingest large amounts of data into your cluster, e.g. separate data ingestion node (+hdfs cmds to move files into hdfs), Nifi, distcp,...

Online	Offline
Last Visited	‎08-18-2019 08:21 AM

Member Since	‎09-15-2015 02:21 PM
Last Visited	‎08-18-2019 08:21 AM
Posts	457
Kudos received	472

Cloudera Community

Re: NiFi: How do I see the flowfile attributes nam...

Re: NiFi: JSON Array split

Re: Securing Solr with Ranger ERROR 500

Re: Is Ambari Infra open source?

Re: After disabling kerberos , ZKfailover not comi...

Re: curl with hive on secured cluster

Re: Cant find HDP version 2.4 when want to upgrade...

Re: HA for NameNode doesn't work

Re: How to limit the size of ranger log and number...

Re: Difference between string and character - Impa...

Re: when inserting data from hive parquet table wi...

Re: How to check the hive query details associated...

Re: How to remove risk disks from Hadoop cluster ?

Re: Number of distcp mappers is small. Why?

Re: When uploading file in HDFS through HUE, it cr...