About sunile_manjee

sunile_manjee · ‎02-13-2017

Easist, Use Apache NiFi, using the UI to move data from any RDBMS to hive table. You can use hive streaming option as well with apache nifi. Full details on how to do this is here: https://community.hortonworks.com/articles/45706/using-the-new-hiveql-processors-in-apache-nifi-070.html Also you can use Sqoop. Full details on how to sqoop data from RDMBS to hive/hadoop and back is here http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/using_sqoop_to_move_data_into_hive.html Just as an example, to move entire table from mysql into a hive table named EMPLOYEES: sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES Or only latest data sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES --where "start_date > '2010-01-01'" or using a query sqoop import --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' --split-by a.id --target-dir /user/foo/joinresults I would by default use NiFi since it is the easiest way to get data into hive/hadoop

sunile_manjee · ‎02-13-2017

I am not sure about the cloudera vm, as my CDH vm has faulted on me several times so this point I am not sure about. However, what you can do is go into the services you are using and set log4j properties to retain no more then x days For example..Kafka log4j.appender.kafkaAppender=org.apache.log4j.RollingFileAppender log4j.appender.kafkaAppender.MaxFileSize=100MB log4j.appender.kafkaAppender.MaxBackupIndex=9 This will only allow 9 backup logs Here is good article on how to control log sizes and retention for each HDP service https://community.hortonworks.com/content/kbentry/8882/how-to-control-size-of-log-files-for-various-hdp-c.html

sunile_manjee · ‎02-12-2017

Yes you can. This requires state to keep track of the number. Use distributed map cache (DMC) to fetch and put your sequence. Your DMC put would be existing number plus 1.

sunile_manjee · ‎02-11-2017

The sandbox play roles of ambari, edge, master, and data node. It is setup to get you up and running quick to learn the hadoop stack. In a production environment, you would separate ambari, edge, master services (1:m on each node) and x number (min 3) of data nodes. You will scale your data nodes based on the compute and storage required for your workloadd

sunile_manjee · ‎02-11-2017

I can't say for sure why but I would recommend you suspending your vm when not in use. It is not designed for long running instance. to save your work, don't turn off the box..but instead suspend it. This makes things much easier when you resume activities.

sunile_manjee · ‎02-11-2017

on your sandbox, please confirm you have service atlas running. if not please enable it and rerun sqoop command above

sunile_manjee · ‎02-11-2017

The dbgen.jar was not created during build. please verify you have gcc installed.

sunile_manjee · ‎02-11-2017

OK I found what i was doing wrong. The posthttp processor has a compressionLevel attribute. If you set this value to > 0 it will compress the content as gzip. So no reason to compress prior to using posthttp if you are using gzip

sunile_manjee · ‎02-10-2017

Here is my workflow Data-->CompressContent(gzip, level 2) --> PostHTTP which post to the ListenHTTP endpoint ListenHTTP-->CompressContent(gzip, level 2, for decompression) --> put file For decompression i set compression format to gzip and level 2. However when I look at the file post decompression it is still compressed. any ideas? By the way, I tested the CompressContent process by doing this Data --> CompressContent(gzip, level 2) --> (forDemcompress)CompressContent(gzip, level 2), and in both stages it compresses the data and decompresses with success. This seems to be issue around ListenHTTP into CompressContent for decompression.

sunile_manjee · ‎02-10-2017

I used the update attribute to create an attribute called compressvalue and set it to 2. Then I used this attribute in the posthttp processor Attribute Compression Level and used my attribute $(compressvalue:toNumber()). The processor fails to validate complaining my attribute is not an integer. any ideas?

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Re: Load data from SQL table into Hive table

Re: HDP 2.5 Sandbox Keeps running out of space

Re: Use of Sequence in Nifi without using a custom...

Re: Where are all Hadoop nodes in HDP Sandbox?

Re: HDP 2.5 Sandbox Keeps running out of space

Re: I´m trying to import data from mysql to hive t...

Re: How to perform TPCH on Hive?

Re: NiFi Decompress Content after ListenHTTP

NiFi Decompress Content after ListenHTTP

NiFi Attribute toNumber failing validation