Member since
01-19-2017
3480
Posts
561
Kudos Received
343
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
211 | 03-01-2021 12:58 PM | |
367 | 03-01-2021 12:41 PM | |
404 | 03-01-2021 10:11 AM | |
213 | 01-04-2021 09:47 AM | |
400 | 01-03-2021 01:45 PM |
04-19-2021
01:53 PM
@vidanimegh What is the /etc/hosts entry remember both hosts /etc/hosts files should have the same entries? On Source <source-nameservice> IP
<destination-nameservice IP On destination <source-nameservice> IP
<destination-nameservice IP Please let me know
... View more
04-19-2021
08:07 AM
@vidanimegh Can you try it differently like below and let me know? hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://<source-nameservice>/path/to/source/folder hdfs://<destination-nameservice>/path/to/destination/folder Always try to use nameservice !
... View more
04-14-2021
12:57 PM
@TDStephanieSoft I am highly suspecting you download the HDF 3.1.1 instead of HDP 3.0.1 because the links are on the same page Please verify that you are really trying to access HDP. Can you double check the ova file you downloaded? if you see 3.1.1 then you downloaded HDF aka Nifi Hope that helps
... View more
04-02-2021
03:09 PM
@shaz11 Can you share the screenshot of your policy in Ranger ? Once you enable Ranger plugin then the authorization is automatically delegated to Ranger so for any user to access the hive tables the permissions should be explicitly given from Ranger! Hope that helps!
... View more
04-01-2021
01:10 AM
@swapko That's bizarre can you share your step? and source of download?
... View more
03-18-2021
03:05 AM
@emeric https://www.linkedin.com/in/sheltong/
... View more
03-18-2021
02:38 AM
@emeric Same name and photo on linkedin profile.
... View more
03-18-2021
02:00 AM
@emeric Cool so how will you send it safely? check my linkedin should be easy to connect 🙂
... View more
03-17-2021
02:22 PM
@emeric Twitter has made it di^fficult to register any app, so I am waiting for approval. Sincerely I couldn't stand the 200-word essay to explain what I intend to do whether I am part of a govt etc bla bla bla. I just copied some text from the website and paste from some website I hope I pass the review 🙂 By Friday I should be good to go
... View more
03-17-2021
12:01 AM
@emeric Could you try substituting the current values with the below flume.conf hdfs://10.0.2.15:8020/user/flume/tweets/
hdfs://127.0.0.1:8020/user/flume/tweets/
hdfs://192.168.56.101:8020/user/flume/tweets/ Let me know
... View more
03-16-2021
01:26 PM
@emeric what is the output from the Quickstart sandbox CLI of the below command? $ ifconfig I am thinking we are on the right path. I will download a sandbox tomorrow if you don't success and try to reproduce your situation. Happy hadooping
... View more
03-16-2021
04:48 AM
@emeric That looks a hostname issue this looks like the offending line TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/flume/tweets/ Can you replace the quickstart.cloudera:8020/user/flume/tweets/ with <Sandbox-IP>:8020/user/flume/tweets/ Please let me know
... View more
03-15-2021
01:49 PM
@ryu try the following solution. Always note all the changes you make incase you will need to revert Follow these steps to resolve the issue: 1. Open Ambari. 2. Go to TEZ / Configs / Advanced tez-site. 3. Locate the configuration tez.history.logging.service.class. 4. Replace the value org.apache.tez.dag.history.logging.ats.ATSV15HistoryLoggingService with the new value: org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService" 5. Save the configuration changes. 6. Restart all services that Ambari is asking you to restart. Then retry the [root@test02 ~]# hive Please revert
... View more
03-15-2021
01:36 PM
@totti1 This all about the HMS hive Metadata Refreshing Spark SQL caches Parquet metadata for better performance. When Hive metastore Parquet table conversion is enabled, metadata of those converted tables are also cached. If these tables are updated by Hive or other external tools, you need to refresh them manually to ensure consistent metadata. from os.path import expanduser, join
from pyspark.sql import SparkSession
from pyspark.sql import Row
# warehouse_location points to the default location for managed databases and tables
warehouse_location = 'spark-warehouse'
spark = SparkSession \
.builder \
.appName("Python Spark SQL Hive integration example") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.enableHiveSupport() \
.getOrCreate()
# spark is an existing SparkSession
spark.sql("CREATE TABLE IF NOT EXISTS totti (key INT, value STRING)")
# Load some data here
spark.sql("LOAD DATA LOCAL INPATH 'path/to/the/table/totti.txt' INTO TABLE totti")
# Refresh the HMS metastore
// spark is an existing SparkSession
spark.catalog.refreshTable("totti")
# Queries are expressed in HiveQL
spark.sql("SELECT * FROM totti").show() In the above example, you will need to connect to the database to create the table totti. Notice I run the refresh before the select so that the Metadata is invalidated and fetched from the databases else I will get no table found etc
... View more
03-15-2021
12:59 PM
@emeric Can you copy and paste the new flume. conf for clarity I have split the different parts Flow Diagram Configuring the flume.conf # Naming the components on the current agent.
TwitterAgent.sources = Twitter # Added
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
# Configuring the source
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = <consumerKey>
TwitterAgent.sources.Twitter.consumerSecret = <consumerSecret>
TwitterAgent.sources.Twitter.accessToken = <accessToken>
TwitterAgent.sources.Twitter.accessTokenSecret = <accessTokenSecret>
TwitterAgent.sources.Twitter.keywords = <keyword>
# Configuring the sink
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/flume/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
# Configuring the channel
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transitionCapacity = 100
# Binding the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
$ bin/flume-ng agent --conf ./conf/ -f /home/cloudera/flume.conf -n TwitterAgent Please let me know if it runs successfully
... View more
03-15-2021
12:00 PM
@sandipkumar Think about it Impala uses HMS so remember that the Hive metastore database is required for Impala to function. So if HMS is not running then no Impala query/job should be launched. Hope that helps
... View more
03-15-2021
11:49 AM
@ryu How is your cluster setup? The number of nodes and the HDP versions? Are you running your HQL from the edge node? Give as much information as possible.
... View more
03-14-2021
12:04 PM
@Jay2021 Impala and hive share metadata catalog ie Hive MetaStore , when a database/table is created in HIVE it's readily available for hive users but not Impala! To successfully query a table or database created in HIVE there is a caveat you need to run the INVALIDATE METADATA from the impala-shell before the table is available for Impala queries. INVALIDATE METADATA reloads all the metadata for the table needed for a subsequent query The next time the current Impala node performs a query against a table whose metadata is invalidated you definitely will run into errors you could use the REFRESH in the common case where you add new data files for an existing table it reloads the metadata immediately, but only loads the block location data for newly added data files, making it a less expensive operation overall. INVALIDATE METADATA [[db_name.]table_name] Example $ impala-shell
> INVALIDATE METADATA new_db_from_hive.new_table_from_hive;
> SHOW TABLES IN new_db_from_hive;
+---------------------+
| new_table_from_hive |
+---------------------+ That should resolve your issue Happy hadooping
... View more
03-13-2021
02:11 PM
@SnehasishRSC REFRESH in the common case where you add new data files for an existing table it reloads the metadata immediately, but only loads the block location data for newly added data files, making it a less expensive operation overall. It is recommended to run COMPUTE STATS when 30 % of data is altered in a table, where altered means the addition or deletion of files/data. INVALIDATE METADATA is a relatively expensive operation compared to the incremental metadata update done by the REFRESH statement, so in the common scenario of adding new data files to an existing table, prefer REFRESH rather than INVALIDATE METADATA which marks the metadata for one or all tables as stale. The next time the Impala service performs a query against a table whose metadata is invalidated, Impala reloads the associated metadata before the query proceed. Hope that helps
... View more
03-01-2021
01:06 PM
@raghurok Bad news As of February 1, 2021, all downloads of CDH and Cloudera Manager require a username and password and use a modified URL. You must use the modified URL, including the username and password when downloading the cloudera repository contents Hope that helps
... View more
03-01-2021
12:58 PM
@ryu My advice is just don't attempt because the HDP software is closely wired. Vigorous unit testing and compatibility are implemented before certifying a version. HDP is a packaged software when you update it's either all or none, you can't update only a component except Ambari and the underlying databases for hive,oozie,ranger etc Yes, the old good days of real open source is gone.I loved HWX If you are running production clusters then you definitely need to a subscription Hope that helps
... View more
03-01-2021
12:41 PM
@totti1 You will need to copy the hdfs/core-site.xml to a local path accessible to your windows. And you will need to update your host's file entry to make the VM reachable from the windows machine. You should be able to ping your vm from the windows machine and vice versa. Edit and change core-site.xml and hdfs-site.xml files and remove the FQDN:8020 to an IP ie for class C network like 192.168.10.201:8020 restart the processors and let me know. Hope that helps?
... View more
03-01-2021
10:24 AM
@Alex_IT From my Oracle knowledge, there are 2 options for migrating the same Oracle_home [DB] from 12C to 19C if you are running 12.1.0.2 then you have the direct path see the attached matrix. With this option, you won't need to change the hostname. The other option is to export your current schema CM ,oozie,hive,hue,Ranger etc schemas install a fresh Oracle 19c box with an empty database, and import the old schemas this could be a challenge as you might have to rebuild indexes or recompile some database packages etc but bot are doable. Hope that helps
... View more
03-01-2021
10:11 AM
@totti1 Nifi cluste is not aware of your Hadoop cluster until you copy these 2 files from your cluster /etc/hadoop/conf/hdfs-site.xml or /etc/hadoop/conf/core-site.xml to your local nifi installation Hadoop configuration resources=/local/dir/hdfs-site.xml,/local/dir/core-site.xml look for any of these processor group for HDFS Hope that helps
... View more
02-16-2021
11:04 PM
@rohit_sharma Can you change your syntax as below, note the zookeeper ensemble /bin/kafka-topics.sh --create \
--zookeeper zk1:2181,zk2:2181,zk3:2181 \
--topic "topic_name" \
--partitions 1> \
--replication-factor 2 Hope that helps
... View more
01-12-2021
11:58 AM
@zetta4ever In a Hadoop cluster, three types of nodes exist Master, Worker and edge nodes. The distinction of roles helps maintain efficiency. Master nodes control which nodes perform which tasks and what processes run on what nodes. The majority of work is assigned to worker nodes. Worker node store most of the data and perform most of the calculations Edge nodes aka gateway facilitate communications from end users to master and worker nodes. The 3 masternodes should have the Namenode[Active & Standby],YARN [Active & Standby], Zookeeper Quorum [3 masters] and the other component you intend to install and on the 6 worker node aka slave nodes you will install the Nodemanager,Datanodes and the all the clients. There is no need to install the client on the master nodes, Some nodes have important tasks, which may impact performance if interrupted. Edge nodes allow end-users to contact worker nodes when necessary, providing a network interface for the cluster without leaving the entire cluster open to communication. That limitation improves reliability and security. As work is evenly distributed between work nodes, the edge node’s role helps avoid data skewing and performance issues. See my document on edge node https://community.cloudera.com/t5/Support-Questions/Edge-node-or-utility-node-packages/td-p/202164# Hope that helps
... View more
01-05-2021
12:35 PM
@sass I just posted a response to a similar question and it should be valid for your case too. Folks are starting to miss Hortonworks right? https://community.cloudera.com/t5/Support-Questions/CDH-Express-edition-be-affected-with-Paywall-subscription/td-p/308786 Happy hadooping !!!! Was your question answered? If so make sure to mark the answer as the accepted solution. If you find a reply useful, Kudos this answer by hitting the thumbs up button.
... View more
01-05-2021
12:28 PM
@Ninads Here is a community article by @ kramalingam Connecting to Kerberos secured HBase cluster from Java application it's a walkthrough that should give you ideas Was your question answered? If so make sure to mark the answer as the accepted solution. If you find a reply useful, Kudos this answer by hitting the thumbs up button.
... View more
01-05-2021
12:14 PM
@sass You should get worried if you are using CDH express because once the trial period expires, a valid subscription will be required to continue the use of the software. This blanket change of policy will affect all legacy versions for Cloudera Distribution including Apache Hadoop (CDH), Hortonworks Data Platform (HDP), Data Flow (HDF/CDF), and Cloudera Data Science Workbench (CDSW). Here is a good read from Cloudera and the details of want you should know and expect come January 31, 2021 Paywall Expansion Update Happy hadooping Was your question answered? If so make sure to mark the answer as the accepted solution. If you find a reply useful, Kudos this answer by hitting the thumbs up button.
... View more