Member since
11-12-2018
100
Posts
129
Kudos Received
15
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1685 | 05-28-2020 09:07 PM | |
1483 | 05-28-2020 08:46 PM | |
265 | 04-09-2020 01:56 AM | |
312 | 04-03-2020 12:11 AM | |
218 | 03-31-2020 01:35 AM |
01-23-2021
06:41 AM
Hi @adrijand Yeah, it seems some jar conflicts somewhere. You are trying to load Hive 1.1.0 classes before the ones included with Spark, and as such, they might fail to reference a Hive configuration that didn't exist in 1.1.0. like below. : java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME
at org.apache.spark.sql.hive.HiveUtils$.formatTimeVarsForHiveClient(HiveUtils.scala:195)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:286)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66) But here in the description mentioned you are using CDH 5.15v but in your log snippets it showing Apache Spark ( spark-2.3.0-bin-without-hadoop ) and Apache Hive (apache-hive-1.1.0-bin) version which is not a pre-built package version that comes along with CDH stack compatibility. Are you trying with building with varying versions of Hive which you would like to connect from a remote airflow docker container?
... View more
12-22-2020
10:31 PM
Hi, @murali2425 @vchhipa It seems some dependency issue while building your custom NiFi Processor, org.apache.nifi:nifi-standard-services-api-nar dependency needs to be added in pom.xml of nifi-*-nar module. Ref here <dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-standard-services-api-nar</artifactId>
<version> 1.11 .3 </version>
<type>nar</type>
</dependency> Please modify your pom.xml and rebuild and see whether that fixes the issue. Please accept the answer you found most useful.
... View more
12-22-2020
10:15 PM
Hi @TimmehG, Spark has a configurable metrics system based on the Dropwizard Metrics Library . This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV files. The metrics are generated by sources embedded in the Spark codebase. They provide instrumentation for specific activities and Spark components. The metrics system is configured via a configuration file that Spark expects to be present at $SPARK_HOME/conf/metrics.properties . A custom file location can be specified via the spark.metrics.conf configuration property . Instead of using the configuration file, a set of configuration parameters with prefix spark.metrics.conf. can be used. I agree with you, running spark applications continuously & reliably is a challenging task, and a good performance monitoring system is needed. Several external tools can be used to help profile the performance of Spark jobs: Cluster-wide monitoring tools, such as Ganglia, can provide insight into overall cluster utilization and resource bottlenecks. For instance, a Ganglia dashboard can quickly reveal whether a particular workload is disk-bound, network bound, or CPU bound. OS profiling tools such as dstat, iostat, and iotop can provide fine-grained profiling on individual nodes. JVM utilities such as jstack for providing stack traces, jmap for creating heap-dumps, jstat for reporting time-series statistics and jconsole for visually exploring various JVM properties are useful for those comfortable with JVM internals. For more insights you can refer to the below links: https://spark.apache.org/docs/latest/monitoring.html https://blog.cloudera.com/demystifying-spark-jobs-to-optimize-for-cost-and-performance/ https://www.infoq.com/articles/spark-application-monitoring-influxdb-grafana/ https://db-blog.web.cern.ch/blog/luca-canali/2017-03-measuring-apache-spark-workload-metrics-performance-troubleshooting Please accept the answer you found most useful.
... View more
07-20-2020
12:27 AM
1 Kudo
Can you verify one, whether did you followed all the steps listed in the documentation? https://docs.cloudera.com/runtime/7.1.1/ozone-storing-data/topics/ozone-setting-up-ozonefs.html
... View more
06-06-2020
09:38 PM
Hi @Ettery Can you try to add those properties in nifi.properties ? the Docker configuration has been updated to allow proxy whitelisting from the run command the host header protection is only enforced on "secured" NiFi instances. This should make it much easier for users to quickly deploy sandbox environments like you are doing in this case Even you can try with: -e NIFI_WEB_HTTP_HOST=<host> in docker run command docker run --name nifi -p 9090:9090 -d -e NIFI_WEB_HTTP_PORT='9090' -e NIFI_WEB_HTTP_HOST=<host> apache/nifi:latest In GitHub example configuration and documentation for NiFi running behind a reverse proxy that you may be interested in. For more detail refer stackoverflow1 and stackoverflow2
... View more
06-06-2020
09:15 PM
Glad to hear that you have finally found the root cause of this issue. Thanks for sharing @Heri
... View more
06-05-2020
07:58 PM
1 Kudo
You can try with spark-shell --conf spark.hadoop.hive.exec.max.dynamic.partitions=xxxxx. $ spark-shell --conf spark.hadoop.hive.exec.max.dynamic.partitions=30000 Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://hostname:port Spark context available as 'sc' (master = yarn, app id = application_xxxxxxxxxxxx_xxxx). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.x.x.x.x.x.x-xx /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112) Type in expressions to have them evaluated. Type :help for more information. scala> spark.sqlContext.getAllConfs.get("spark.hadoop.hive.exec.max.dynamic.partitions") res0: Option[String] = Some(30000) Ref: SPARK-21574
... View more
05-28-2020
09:07 PM
2 Kudos
Hi @Karan1211, User ' admin' does not have access to create a directory under /user. Because t he /user/ directory is owned by "hdfs" with 755 permissions. As a result, only hdfs can write to that directory. So you would need to do this: If you want to create a home directory for root so you can store files in this directory, do: sudo -u hdfs hdfs dfs -mkdir /user/admin sudo -u hdfs hdfs dfs -chown admin /user/admin Then as admin you can do hdfs dfs -put file /user/admin/ NOTE: If you get below authentication error, either from your user account, you do not have enough permission to run the above command, so try with sudo or try with first sudo to hdfs user and then execute chown command as hdfs user. su: authentication failure I hope this helps.
... View more
05-28-2020
08:46 PM
1 Kudo
HI @Heri, Here I just wanna add some points. You can use PURGE option to delete data file as well along with partition metadata but it works only in Internal/ Managed tables ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec PURGE; But for External tables have a two-step process to alter table drop partition + removing file ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec; hdfs dfs -rm -r <partition file path> I hope this gives some insights here. cc @aakulov
... View more
04-23-2020
08:03 PM
1 Kudo
Please can you check with your internal Linux team / Network team for further support? Because it seems you have some internal connection while connecting the node from the Intellij idea node. Once you resolve the connection issue we will check further.
... View more
04-23-2020
07:57 PM
1 Kudo
Can you add below property at <spark_home>/conf/hive-site.xml and <hive-home>/conf/hive-site.xml hive.exec.max.dynamic.partitions=2000 <name>hive.exec.max.dynamic.partitions</name>
<value>2000</value>
<description></description> Hope this helps. Please accept the answer and vote up if it did. Note : Restart HiveServer2 and Spark History Server if it didn't work. -JD
... View more
04-21-2020
12:28 PM
1 Kudo
Can you try this below article? https://saagie.zendesk.com/hc/en-us/articles/360021384151-Read-Write-files-from-HDFS
... View more
04-21-2020
10:31 AM
1 Kudo
Hi @w12q12 So as per the below error in the log trace 20/04/21 18:20:50 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1067413441-127.0.0.1-1508775264580:blk_1073743149_2345 file=/data/ratings.csv at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:930) .... Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1067413441-127.0.0.1-1508775264580:blk_1073743149_2345 file=/data/ratings.csv at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:930) It seems that the namenode is not able to connect to the datanode when you ran the command. Please can you try to ping and telnet datanode and name node vice versa also check whether do you have any corrupt blocks and files in the cluster? ~JD
... View more
04-09-2020
01:56 AM
1 Kudo
Hi @drgenious Are you getting a similar error which reported in KUDU-2633 It seems this is open JIRA reported in the community ERROR core.JobRunShell: Job DEFAULT.EventKpisConsumer threw an unhandled Exception:
org.apache.spark.SparkException: Job aborted due to stage failure: Aborting TaskSet 109.0 because task 3 (partition 3) cannot run anywhere due to node and executor blacklist. Blacklisting behavior can be configured via spark.blacklist.*. If you have the data in HDFS in (csv/avro/parquet) format, then you can use the below command to import the files to Kudu table. Prerequisites: Kudu jar with compatible version (1.6 or higher) For more reference spark2-submit --master yarn/local --class org.apache.kudu.spark.tools.ImportExportFiles <path of kudu jar>/kudu-spark2-tools_2.11-1.6.0.jar --operation=import --format=<parquet/avro/csv> --master-addrs=<kudu master host>:<port number> --path=<hdfs path for data> --table-name=impala::<table name> Hope this helps. Please accept the answer and vote up if it did.
... View more
04-03-2020
01:43 AM
1 Kudo
Hi @Gubbi You can try the below script. source_dir="/user/mft/inbound/"
dest_dir="/users/aws/outbound/"
for file in "$source_dir"*; do dest=$(echo $file | grep -o -P '(?<=_).*(?=.csv)'); mkdir -p "$dest_dir$dest"; mv $file "$dest_dir$dest"; done Hope this helps. Please accept the answer and vote up if it did.
... View more
04-03-2020
12:11 AM
2 Kudos
Hi @Mondi The important differences between parcels and packages are: Parcels are self-contained and installed in a versioned directory, which means that multiple versions of a given parcel can be installed side-by-side. You can then designate one of these installed versions as the active one. With packages, only one package can be installed at a time so there is no distinction between what is installed and what is active. You can install parcels at any location in the filesystem. They are installed by default in /opt/cloudera/parcels. In contrast, packages are installed in /usr/lib. When you install from the Parcels page, Cloudera Manager automatically downloads, distributes and activates the correct parcel for the operating system running on each host in the cluster. Note: You cannot install software using both parcels and packages in the same cluster. Because of their unique properties, parcels offer more advantages over packages, for more details please refer here Hope this helps. Please accept the answer and vote up if it did. Regards,
... View more
04-01-2020
02:23 AM
Try to clean up some old files from your disk or else add some more space into the disk.
... View more
04-01-2020
12:43 AM
Hi @Dharm, What do you mean by bad health space here? Is HDFS Full and getting critical alerts either in Ambari or Cloudera Manager? Are you able to read and write on HDFS? Please can you elaborate as well it will be more helpful if you attach some screenshot to understand the issue? Regards
... View more
03-31-2020
01:35 AM
Hi @sppandita85BLR Currently, there is no documented procedure to migrate from HDP. In these cases, it's best to engage with your local Cloudera account rep and professional services. They may help you with the runbook to do the migration or any other feasibility. Hope this helps. Please accept the answer and vote up if it did. Regards,
... View more
03-27-2020
11:11 AM
2 Kudos
Hi @rajisridhar You can use a command like this to get the start and end time and then store it where you wish to or configure mail accordingly to your requirements. Example: $ oozie job -oozie http://localhost:11000/oozie -info 14-20090525161321-oozie-joe
.
.----------------------------------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : map-reduce-wf
App Path : hdfs://localhost:8020/user/joe/workflows/map-reduce
Status : SUCCEEDED
Run : 0
User : joe
Group : users
Created : 2009-05-26 05:01 +0000
Started : 2009-05-26 05:01 +0000
Ended : 2009-05-26 05:01 +0000
Actions
.----------------------------------------------------------------------------------------------------------------------------------------------------------------
Action Name Type Status Transition External Id External Status Error Code Start End
.----------------------------------------------------------------------------------------------------------------------------------------------------------------
hadoop1 map-reduce OK end job_200904281535_0254 SUCCEEDED - 2009-05-26 05:01 +0000 2009-05-26 05:01 +0000
.---------------------------------------------------------------------------------------------------------------------------------------------------------------- For detailed information see this: https://oozie.apache.org/docs/3.3.2/DG_CommandLineTool.html#Jobs_Operations View solution in original post Hope this helps. Please accept the answer and vote up if it did.
... View more
03-27-2020
01:52 AM
Hi @JasmineD, We might need to consider backing up the following: flow.xml.gz users.xml authorizations.xml All config files in NiFi conf directory NiFi local state from each node NiFi cluster state stored in zookeeper. Please make sure that you have stored the configuration passwords safely. NiFi relies on sensitive.props.key password to decrypt sensitive property values from flow.xml.gz file. If they do not know sensitive props key, they would need to manually clear all encoded values from flow.xml.gz. This action will clear all passwords in all components on the canvas. We need to re-enter all of them once NiFi was recovered. Also, if there are any local files that are required by the DataFlows, that would also need to be backed up as well. (i.e., Custom processor jars, user-built scripts, externally referenced config/jar files used by some processors, etc.). Note: All the repositories in NiFi are backed up by default. Here is a good article to see how backup works in NiFi. https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418 Hope this helps. Please accept the answer and vote up if it did.
... View more
11-24-2019
09:52 PM
Hi @anshuman Yes, we have Node labels support in the new CDP. For more details, you can check CDP documentations. https://docs.cloudera.com/ -> Cloudera Data Platform -> Runtime -> Cloudera Runtime https://docs.cloudera.com/runtime/7.0.2/yarn-allocate-resources/topics/yarn-configuring-node-labels.html FYI. Cloudera Runtime is the core open-source software distribution within Cloudera Data Platform (CDP) that is maintained, supported, versioned, and packaged as a single entity by Cloudera. Cloudera Runtime includes approximately 50 open source projects that comprise the core distribution of data management tools within CDP, including Cloudera Manager, which is used to configure and monitor clusters managed in CDP.
... View more
11-24-2019
09:17 PM
Hi @Tanred, Ideally, it should run every day at 5 pm. Please, can you check the NIFI app logs of 2 consecutive days and see the log entry at that time. Else paste those error trace here for further analysis.
... View more
01-15-2019
12:03 PM
1 Kudo
@Michael Bronson Decommissioning is a process that supports removing components and their hosts from the cluster. You must decommission a master or slave running on a host before removing it or its host from service. Decommissioning helps you to prevent potential loss of data or disruption of service. Below HDP documentation for Ambari-2.6.1 help you to decommission a DataNode. When DataNode decommissioning process is finished, the status display changes to Decommissioned. https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.1.0/bk_ambari-operations/content/how_to_decommission_a_component.html I hope that the above answers your questions.
... View more
01-14-2019
06:16 PM
1 Kudo
@Michael Bronson Below article help you to replacing faulty disks on datanode. https://community.hortonworks.com/articles/3131/replacing-disk-on-datanode-hosts.html Please accept the answer you found most useful.
... View more
01-11-2019
01:29 PM
1 Kudo
@harish If you made update on that particular queue value then you can ignore it. Because yarn automatically made necessary changes after you updated that particular queue.
... View more
01-11-2019
03:06 AM
1 Kudo
@harish There are some ordering policies to use different algorithms to decide which leaf queue should be scheduled the first. This ordering policy is set on the parent queue and its value is 'utilization' or 'priority-utilization'. (YARN-5864) You can refer below article for more info https://community.hortonworks.com/content/supportkb/174918/the-ordering-policy-setting-for-root-yarn-queue-ya.html Please accept the answer you found most useful.
... View more
01-07-2019
12:00 PM
1 Kudo
@Harish More Please can you confirm whether new user having appropriate permission to access all database like old user, you can check all defined policies in Ranger/Sentry depend on your cluster security access configuration.
... View more
01-05-2019
06:54 PM
1 Kudo
@Raghav Mp HDFS root scratch directory for Hive jobs, which gets created with write all (733) permission. For each connecting user, an HDFS scratch directory ${hive.exec.scratchdir}/ is created with ${hive.scratch.dir.permission}. To fix this issue, you can set below property in hive-site.xml, this is used for setting values for the entire Hive configuration <property> <name>hive.exec.scratchdir</name> <value>/tmp/hive</value> <description>Scratch space for Hive jobs</description>
</property>
For more detail: https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration Please accept the answer you found most useful.
... View more
01-05-2019
06:40 PM
@Akash Dixit Suppose below answer won't resolve, please share detailed error for troubleshoot.
... View more