Member since
09-15-2015
457
Posts
507
Kudos Received
90
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
15476 | 11-01-2016 08:16 AM | |
10924 | 11-01-2016 07:45 AM | |
8288 | 10-25-2016 09:50 AM | |
1871 | 10-21-2016 03:50 AM | |
3653 | 10-14-2016 03:12 PM |
03-03-2016
08:50 PM
2 Kudos
Usually when you want to use curl in combination with Kerberos (secured cluster), you have to use the following command: curl --negotiate -u : -X GET 'http://localhost:50111/templeton/v1/hive?user.name=ekoifman' Make sure you have a valid kerberos ticket (run: klist)
... View more
03-03-2016
07:25 AM
1 Kudo
What Ambari version are you using, is it the latest one (2.2.1.0)? You might have to upgrade to the latest version, so that you have the latest stack information
... View more
03-01-2016
05:44 AM
1 Kudo
Important! Only format the Namenode if you do not have any data in your cluster!
... View more
02-25-2016
07:57 PM
1 Kudo
Great content, thanks for sharing!
... View more
02-25-2016
01:01 PM
3 Kudos
Hi @prakash pal there are some differences between these data types, basically string allows a variable length of characters (max 32K chars), char is a fixed length string (max. 255 chars). Usually (I doubt that this is different with Impala) CHAR is more efficient and can speed up operations and is better reg. memory allocation. (This does not mean always use CHAR) See this => "All data in CHAR and VARCHAR columns must be in a character encoding that is compatible with UTF-8. If you have binary data from another database system (that is, a BLOB type), use a STRING column to hold it." There are a lot of use cases where it makes sense to only use CHAR instead of STRING, e.g. lets say you want to have a column that stores the two-letter country code (ISO_3166-1_alpha-2; e.g. US, ES, UK,...), here it makes more sense to use CHAR.
... View more
02-25-2016
06:28 AM
1 Kudo
This might be a Parquet problem, but could also be something else. I have seen some performance and job issues when using Parquet instead of ORC. Have you seen this https://issues.apache.org/jira/browse/HDFS-8475 What features are you missing regarding SparkORC? I have seen you error before, but in a different context (Query on ORC table was failing) Make sure your HDFS (especially the DNs) are running and healthy. It might be related to some bad blocks, so make sure the blocks that are related to your job are ok
... View more
02-22-2016
06:16 PM
1 Kudo
You should be able to see the query in the HiveServer log or a Hive-related UI, like Hive View for Ambari or Hue (there should be a query history). The Resourcemanager does not show the full query, because the job is only named after a partial of the query. Why only a partial? Some queries can be quite large and Job Name is limited in regards to the allowed #characters.
... View more
02-22-2016
06:41 AM
1 Kudo
Take a look at this questions, maybe it is helpful => https://community.hortonworks.com/questions/3012/what-are-the-steps-an-operator-should-take-to-repl.html
... View more
02-18-2016
05:55 PM
2 Kudos
You can specify the number of mappers that will be used for the distcp job. -m <num_maps> Maximum number of simultaneous copies Specify the number of maps to copy data. Note that more maps may not necessarily improve throughput. If nothing is specified, the default should be 20 map tasks. /* Default number of maps to use for DistCp */
public static final int DEFAULT_MAPS = 20;
... View more
02-18-2016
07:05 AM
3 Kudos
This sounds like the issue mentioned here https://github.com/cloudera/hue/issues/304, however I dont know a valid workaround for our Hue version at the moment. I strongly encourage you to use different ways to ingest large amounts of data into your cluster, e.g. separate data ingestion node (+hdfs cmds to move files into hdfs), Nifi, distcp,...
... View more