Member since
01-31-2018
2
Posts
0
Kudos Received
0
Solutions
08-09-2018
06:02 PM
I am tuning a spark application and noticed there are discrepancies between the job's metrics shown on Spark's history server UI and YARN's resource manager UI. I've specified the the following properties on my Zeppelin Notebook's spark interpreter: master yarn-client spark.app.name Zeppelin spark.cores.max spark.driver.memory 3g spark.executor.cores 3 spark.executor.instances 2 spark.executor.memory 4g When I look at the YARN ResourceManager UI I do not see evidence that the executor's containers are getting 3 cores each. I see that they each are using 1 v-core each. Yet, when I check the Spark History Server... it describes each running executor has 3 cores and reflects all the properties I've specified. What's up with this? Which of these should I be looking at? YARN 3.1.0 Zeppelin 0.8.0 Spark2 2.3.1
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
-
Apache Zeppelin
02-02-2018
09:28 PM
// This query:
sqlContext.sql("select * from retail_invoice").show
// gives this output:
+---------+---------+-----------+--------+-----------+---------+----------+-------+
|invoiceno|stockcode|description|quantity|invoicedate|unitprice|customerid|country|
+---------+---------+-----------+--------+-----------+---------+----------+-------+
+---------+---------+-----------+--------+-----------+---------+----------+-------+
// The Hive DDL for the table in HiveView 2.0:
CREATE TABLE `retail_invoice`(
`invoiceno` string,
`stockcode` string,
`description` string,
`quantity` int,
`invoicedate` string,
`unitprice` double,
`customerid` string,
`country` string)
CLUSTERED BY (
stockcode)
INTO 2 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://hadoopsilon2.zdwinsqlad.local:8020/apps/hive/warehouse/retail_invoice'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"country\":\"true\",\"quantity\":\"true\",\"customerid\":\"true\",\"description\":\"true\",\"invoiceno\":\"true\",\"unitprice\":\"true\",\"invoicedate\":\"true\",\"stockcode\":\"true\"}}',
'numFiles'='2',
'numRows'='541909',
'orc.bloom.filter.columns'='StockCode, InvoiceDate, Country',
'rawDataSize'='333815944',
'totalSize'='5642889',
'transactional'='true',
'transient_lastDdlTime'='1517516006') I can query the data in Hive just fine. The data is inserted from Nifi using the PutHiveStreaming processor. We have tried to recreate the table, but the same problem arises. I haven't found any odd looking configurations. Any Ideas on what could be going on here?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
-
Apache Spark