Member since
01-19-2017
3676
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 483 | 06-04-2025 11:36 PM | |
| 1013 | 03-23-2025 05:23 AM | |
| 537 | 03-17-2025 10:18 AM | |
| 2013 | 03-05-2025 01:34 PM | |
| 1259 | 03-03-2025 01:09 PM |
03-18-2021
02:00 AM
@emeric Cool so how will you send it safely? check my linkedin should be easy to connect 🙂
... View more
03-17-2021
02:22 PM
@emeric Twitter has made it di^fficult to register any app, so I am waiting for approval. Sincerely I couldn't stand the 200-word essay to explain what I intend to do whether I am part of a govt etc bla bla bla. I just copied some text from the website and paste from some website I hope I pass the review 🙂 By Friday I should be good to go
... View more
03-17-2021
12:01 AM
@emeric Could you try substituting the current values with the below flume.conf hdfs://10.0.2.15:8020/user/flume/tweets/
hdfs://127.0.0.1:8020/user/flume/tweets/
hdfs://192.168.56.101:8020/user/flume/tweets/ Let me know
... View more
03-16-2021
01:26 PM
@emeric what is the output from the Quickstart sandbox CLI of the below command? $ ifconfig I am thinking we are on the right path. I will download a sandbox tomorrow if you don't success and try to reproduce your situation. Happy hadooping
... View more
03-16-2021
04:48 AM
@emeric That looks a hostname issue this looks like the offending line TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/flume/tweets/ Can you replace the quickstart.cloudera:8020/user/flume/tweets/ with <Sandbox-IP>:8020/user/flume/tweets/ Please let me know
... View more
03-15-2021
01:49 PM
@ryu try the following solution. Always note all the changes you make incase you will need to revert Follow these steps to resolve the issue: 1. Open Ambari. 2. Go to TEZ / Configs / Advanced tez-site. 3. Locate the configuration tez.history.logging.service.class. 4. Replace the value org.apache.tez.dag.history.logging.ats.ATSV15HistoryLoggingService with the new value: org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService" 5. Save the configuration changes. 6. Restart all services that Ambari is asking you to restart. Then retry the [root@test02 ~]# hive Please revert
... View more
03-15-2021
01:36 PM
@totti1 This all about the HMS hive Metadata Refreshing Spark SQL caches Parquet metadata for better performance. When Hive metastore Parquet table conversion is enabled, metadata of those converted tables are also cached. If these tables are updated by Hive or other external tools, you need to refresh them manually to ensure consistent metadata. from os.path import expanduser, join
from pyspark.sql import SparkSession
from pyspark.sql import Row
# warehouse_location points to the default location for managed databases and tables
warehouse_location = 'spark-warehouse'
spark = SparkSession \
.builder \
.appName("Python Spark SQL Hive integration example") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.enableHiveSupport() \
.getOrCreate()
# spark is an existing SparkSession
spark.sql("CREATE TABLE IF NOT EXISTS totti (key INT, value STRING)")
# Load some data here
spark.sql("LOAD DATA LOCAL INPATH 'path/to/the/table/totti.txt' INTO TABLE totti")
# Refresh the HMS metastore
// spark is an existing SparkSession
spark.catalog.refreshTable("totti")
# Queries are expressed in HiveQL
spark.sql("SELECT * FROM totti").show() In the above example, you will need to connect to the database to create the table totti. Notice I run the refresh before the select so that the Metadata is invalidated and fetched from the databases else I will get no table found etc
... View more
03-15-2021
12:59 PM
@emeric Can you copy and paste the new flume. conf for clarity I have split the different parts Flow Diagram Configuring the flume.conf # Naming the components on the current agent.
TwitterAgent.sources = Twitter # Added
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
# Configuring the source
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = <consumerKey>
TwitterAgent.sources.Twitter.consumerSecret = <consumerSecret>
TwitterAgent.sources.Twitter.accessToken = <accessToken>
TwitterAgent.sources.Twitter.accessTokenSecret = <accessTokenSecret>
TwitterAgent.sources.Twitter.keywords = <keyword>
# Configuring the sink
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/flume/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
# Configuring the channel
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transitionCapacity = 100
# Binding the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
$ bin/flume-ng agent --conf ./conf/ -f /home/cloudera/flume.conf -n TwitterAgent Please let me know if it runs successfully
... View more
03-15-2021
12:00 PM
@sandipkumar Think about it Impala uses HMS so remember that the Hive metastore database is required for Impala to function. So if HMS is not running then no Impala query/job should be launched. Hope that helps
... View more
03-15-2021
11:49 AM
@ryu How is your cluster setup? The number of nodes and the HDP versions? Are you running your HQL from the edge node? Give as much information as possible.
... View more