About grabowski14

grabowski14 · ‎09-14-2017

Hi. I got a question which is associated with Ambari API. I want to run a script hdp-configuration-utils, but I need a couple of information - number of cores, memory, disks and HBase enabled (I did not install it so value is 'False'). My questions: 1. When I run command: GET api/v1/clusters/c1/hosts I get parameter names 'cpu_count' and 'ph_cpu_count'. Which one should I use? 2. How can I check what is number of disks? 3. How can I get info about free and total disk size? I got two parameters: - disk_info "disk_info" : [ { "available" : "42331676", "device" : "/dev/mapper/VolGroup-lv_root", "used" : "6521952", "percent" : "14%", "size" : "51475068", "type" : "ext4", "mountpoint" : "/" }, { "available" : "423282", "device" : "/dev/sda1", "used" : "38770", "percent" : "9%", "size" : "487652", "type" : "ext4", "mountpoint" : "/boot" }, { "available" : "45423700", "device" : "/dev/mapper/VolGroup-lv_home", "used" : "53456", "percent" : "1%", "size" : "47917960", "type" : "ext4", "mountpoint" : "/home" } ] - metrics/disk "disk" : { "disk_free" : 83.99, "disk_total" : 95.25, "read_bytes" : 1.9547998208E10, "read_count" : 1888751.0, "read_time" : 2468451.0, "write_bytes" : 1.5247885312E10, "write_count" : 2020357.0, "write_time" : 9.9537697E7 } Which one should I check when I want to compare it with offcial sizing recomendations?

grabowski14 · ‎07-18-2017

It works! Thank you 🙂

grabowski14 · ‎07-17-2017

Hi. I have a problem with Spark 2 interpreter in Zeppelin. I configured interpreter like this: When I run query like this: %spark2.sql select var1, count(*) as counter from database.table_1 group by var1 order by counter desc Spark job runs only 3 containers and job takes 13 minutes. Does anyone know why Spark interpreter takes only 4.9 % of queue? How I should configure the interpreter to increase this factor?

grabowski14 · ‎03-22-2017

@yvora But the problem is that because of Zeppelin, processing time in q_apr_general queue is longer. This is weird because processes are in different queue and YARN should reserve resources available for that queue, not more. I set up max limit but it won't help. Do you have another ideas?

grabowski14 · ‎03-21-2017

Hi. I've got a problem with YARN and Capacity Scheduler. I created two queues: 1. default - 60% 2. q_apr_general - 40% There is one Spark Streaming job in queue 'q_apr_general'. Processing time for every single batch is ~2-6 seconds. In the default queue I started Zeppelin with preconfigured resources. I added to zeppelin-env.sh one line: export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.4.2.0-258 -Dspark.executor.instances=75 -Dspark.executor.cores=6 -Dspark.executor.memory=13G The problem is that when I execute Spark SQL query in Zeppelin, processing time is ~20-30 seconds. It is weird for me, because Zeppelin process and Spark streaming are in different queues. Spark streaming process should not depend on Zeppelin process in another queue. Does anyone know what is the reason of my problem?

grabowski14 · ‎11-09-2016

Hi. I'm trying to install Impala in my cluster. I found two ways to do that: 1. HDP + Impala. There is a problem with two libraries: Error: Package: impala-shell-2.7.0+cdh5.9.0+0-1.cdh5.9.0.p0.32.el6.x86_64 (cloudera-cdh5) Requires: libpython2.6.so.1.0()(64bit) Error: Package: impala-2.7.0+cdh5.9.0+0-1.cdh5.9.0.p0.32.el6.x86_64 (cloudera-cdh5) Requires: libsasl2.so.2()(64bit) I don't know where is the problem. I think it might be a problem with OS or differences between HDP and CDH. 2. Official wiki instruction. But, as you can see, prerequisite is Ubuntu. I use CentOS 7. Does anyone know alternative way to install Impala? My cluster: HDP 2.4, CentOS 7

grabowski14 · ‎11-06-2016

Thank you for the information 🙂

grabowski14 · ‎11-04-2016

Hi. I created Oozie workflow includes HDFS Fs, Sqoop and Hive jobs. The first two jobs work great - Sqoop imports data from Oracle database and save to HDFS. But then there is a problem with Hive, more precisely with Tez. When I try to execute only one Hive statement there is no problem: LOAD DATA INPATH '/user/apb_general/dms_update' OVERWRITE INTO TABLE DMS_TEST_MATGRA; But when I add another statement: LOAD DATA INPATH '/user/apb_general/dms_update' OVERWRITE INTO TABLE DMS_TEST_MATGRA; INSERT OVERWRITE TABLE DMS_TEST_MATGRA_DIST SELECT DISTINCT macaddr, techchannelname, channelzapnumber FROM DMS_TEST_MATGRA; job ends with error: 11938 [main] ERROR org.apache.hadoop.hive.ql.exec.Task - Failed to execute tez graph. java.lang.IllegalArgumentException: size of topologicalVertexStack is:3 while size of vertices is:2, make sure they are the same in order to sort the vertices I found a ticket in JIRA which is associated with this error: DAG.createDag() does not clear local state on repeat calls But fixed versions are 0.7.2 and newer. HDP provides Tez 0.7.0. Do you know how can I overcome this problem?

grabowski14 · ‎09-01-2016

Hi. What is - in your opinion - the best way to import XML file into Hive table? Is there any way to import XML file to Hive directly? My currently idea is: import XML to Oracle table, and then import Oracle table to Hive using Sqoop. Do you have better idea?

grabowski14 · ‎07-22-2016

Hi. I've got a little problem with YARN ResourceManager UI and executing job in Hue. I execute simple query in Hue: select ip, count(*) from dns_data_huge_parquet group by ip having count(*) > 50 order by ip asc I got results after about 10 seconds and everything looks great. But Job Browser in Hue and ResourceManager UI (YARN functionality) show that this job is still running. The job will have got status "Succeeded" after 11-15 minutes. My question is - why application is still running when it is complete and I can see results?

Online	Offline
Last Visited	‎09-07-2020 10:07 AM

Member Since	‎07-08-2016 08:36 AM
Last Visited	‎09-07-2020 10:07 AM
Posts	46
Kudos received	5

Cloudera Community

Re: Capacity scheduler - error while executing job

Re: Ambari requires packages like nc, redhat-lsb, ...

How to get disk info using Ambari API?

Re: Spark 2 interpreter runs only 3 containers

Spark 2 interpreter runs only 3 containers

Re: Spark job in YARN queue depends on jobs in ano...

Spark job in YARN queue depends on jobs in another...

Impala - alternative way to install

Re: Hive job failed on Tez - Failed to execute tez...

Hive job failed on Tez - Failed to execute tez gra...

Best way to create Hive table from XML file

The job is complete, but has status running