Member since
07-08-2016
46
Posts
5
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5556 | 07-21-2016 08:36 AM | |
3963 | 07-12-2016 11:58 AM |
09-14-2017
01:16 PM
Hi. I got a question which is associated with Ambari API. I want to run a script hdp-configuration-utils, but I need a couple of information - number of cores, memory, disks and HBase enabled (I did not install it so value is 'False'). My questions: 1. When I run command: GET api/v1/clusters/c1/hosts I get parameter names 'cpu_count' and 'ph_cpu_count'. Which one should I use? 2. How can I check what is number of disks? 3. How can I get info about free and total disk size? I got two parameters: - disk_info "disk_info" : [
{
"available" : "42331676",
"device" : "/dev/mapper/VolGroup-lv_root",
"used" : "6521952",
"percent" : "14%",
"size" : "51475068",
"type" : "ext4",
"mountpoint" : "/"
},
{
"available" : "423282",
"device" : "/dev/sda1",
"used" : "38770",
"percent" : "9%",
"size" : "487652",
"type" : "ext4",
"mountpoint" : "/boot"
},
{
"available" : "45423700",
"device" : "/dev/mapper/VolGroup-lv_home",
"used" : "53456",
"percent" : "1%",
"size" : "47917960",
"type" : "ext4",
"mountpoint" : "/home"
}
] - metrics/disk "disk" : {
"disk_free" : 83.99,
"disk_total" : 95.25,
"read_bytes" : 1.9547998208E10,
"read_count" : 1888751.0,
"read_time" : 2468451.0,
"write_bytes" : 1.5247885312E10,
"write_count" : 2020357.0,
"write_time" : 9.9537697E7
} Which one should I check when I want to compare it with offcial sizing recomendations?
... View more
Labels:
- Labels:
-
Apache Ambari
07-18-2017
10:01 AM
It works! Thank you 🙂
... View more
07-17-2017
11:49 AM
1 Kudo
Hi. I have a problem with Spark 2 interpreter in Zeppelin. I configured interpreter like this:
When I run query like this: %spark2.sql
select var1, count(*) as counter
from database.table_1
group by var1
order by counter desc Spark job runs only 3 containers and job takes 13 minutes. Does anyone know why Spark interpreter takes only 4.9 % of queue? How I should configure the interpreter to increase this factor?
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
03-22-2017
08:55 AM
@yvora But the problem is that because of Zeppelin, processing time in q_apr_general queue is longer. This is weird because processes are in different queue and YARN should reserve resources available for that queue, not more. I set up max limit but it won't help. Do you have another ideas?
... View more
03-21-2017
04:18 PM
Hi. I've got a problem with YARN and Capacity Scheduler. I created two queues: 1. default - 60% 2. q_apr_general - 40% There is one Spark Streaming job in queue 'q_apr_general'. Processing time for every single batch is ~2-6 seconds. In the default queue I started Zeppelin with preconfigured resources. I added to zeppelin-env.sh one line: export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.4.2.0-258 -Dspark.executor.instances=75 -Dspark.executor.cores=6 -Dspark.executor.memory=13G The problem is that when I execute Spark SQL query in Zeppelin, processing time is ~20-30 seconds. It is weird for me, because Zeppelin process and Spark streaming are in different queues. Spark streaming process should not depend on Zeppelin process in another queue. Does anyone know what is the reason of my problem?
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
-
Apache Zeppelin
11-09-2016
02:01 PM
1 Kudo
Hi. I'm trying to install Impala in my cluster. I found two ways to do that: 1. HDP + Impala. There is a problem with two libraries: Error: Package: impala-shell-2.7.0+cdh5.9.0+0-1.cdh5.9.0.p0.32.el6.x86_64 (cloudera-cdh5)
Requires: libpython2.6.so.1.0()(64bit)
Error: Package: impala-2.7.0+cdh5.9.0+0-1.cdh5.9.0.p0.32.el6.x86_64 (cloudera-cdh5)
Requires: libsasl2.so.2()(64bit)
I don't know where is the problem. I think it might be a problem with OS or differences between HDP and CDH. 2. Official wiki instruction. But, as you can see, prerequisite is Ubuntu. I use CentOS 7. Does anyone know alternative way to install Impala? My cluster: HDP 2.4, CentOS 7
... View more
Labels:
- Labels:
-
Apache Impala
11-06-2016
09:47 PM
Thank you for the information 🙂
... View more
11-04-2016
09:18 AM
1 Kudo
Hi. I created Oozie workflow includes HDFS Fs, Sqoop and Hive jobs. The first two jobs work great - Sqoop imports data from Oracle database and save to HDFS. But then there is a problem with Hive, more precisely with Tez. When I try to execute only one Hive statement there is no problem: LOAD DATA INPATH '/user/apb_general/dms_update' OVERWRITE INTO TABLE DMS_TEST_MATGRA;
But when I add another statement: LOAD DATA INPATH '/user/apb_general/dms_update' OVERWRITE INTO TABLE DMS_TEST_MATGRA;
INSERT OVERWRITE TABLE DMS_TEST_MATGRA_DIST SELECT DISTINCT macaddr, techchannelname, channelzapnumber FROM DMS_TEST_MATGRA;
job ends with error: 11938 [main] ERROR org.apache.hadoop.hive.ql.exec.Task - Failed to execute tez graph.
java.lang.IllegalArgumentException: size of topologicalVertexStack is:3 while size of vertices is:2, make sure they are the same in order to sort the vertices I found a ticket in JIRA which is associated with this error: DAG.createDag() does not clear local state on repeat calls But fixed versions are 0.7.2 and newer. HDP provides Tez 0.7.0. Do you know how can I overcome this problem?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Oozie
-
Apache Tez
09-01-2016
07:20 AM
Hi. What is - in your opinion - the best way to import XML file into Hive table? Is there any way to import XML file to Hive directly? My currently idea is: import XML to Oracle table, and then import Oracle table to Hive using Sqoop. Do you have better idea?
... View more
Labels:
- Labels:
-
Apache Hive
07-22-2016
08:00 AM
Hi. I've got a little problem with YARN ResourceManager UI and executing job in Hue. I execute simple query in Hue: select ip, count(*)
from dns_data_huge_parquet
group by ip
having count(*) > 50
order by ip asc I got results after about 10 seconds and everything looks great. But Job Browser in Hue and ResourceManager UI (YARN functionality) show that this job is still running. The job will have got status "Succeeded" after 11-15 minutes. My question is - why application is still running when it is complete and I can see results?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache YARN
- « Previous
-
- 1
- 2
- Next »