About ravikirandasar1

Amoli · ‎01-14-2021

Hi ravikirandasar1, I also have the same query.Could you please let me know how did you automate this job using crontab for everyday download of the files to hdfs?

qwikbaba · ‎01-10-2019

I have currently set up a HDF 3.3.1 with NiFi on a standalone machine. I want to go ahead and install HDFS for storage purposes. Can I work with the latest version of HDP ?? Please advice !!! @Matt Burgess

ravikirandasar1 · ‎04-16-2018

Hi Salvator, Im facing same problem, do u find any solution for this?

ssharma · ‎01-23-2018

@Ravikiran Dasari Please accept the answer if it addresses your query 🙂 or let me know if you need any further information.

ravikirandasar1 · ‎01-15-2018

@Jay Kumar SenSharma Thank u.. do u have any idea abt installation of NiFi on HDP cluster?

Shu_ashu · ‎01-10-2018

@Ravikiran Dasari Create a sqoop job for your import as sqoop job --create <job-name> -- import --connect "jdbc:sqlserver://10.21.29.15:1433;database=db;username=ReportingServices;password=ReportingServices" --check-column batchid --incremental append -m 1 --hive-table mmidwpresentation.journeypositions_archive --table JourneyPositions --hive-import --schema safedrive So once you create sqoop job sqoop will store the last value for the batchid(it's check column argument), when ever you run the job again sqoop will pull only new records after the last state value. Sqoop Job Arguments:- $ sqoop job --create <job-name>Define a new saved job with the specified job-id (name). A second Sqoop comm and-lin e, separated by a -- should be specified; this defines the saved job. --delete <job-name>Delete a saved job. --exec <job-name>Given a job defined with --create, run the saved job. --show <job-name>Show the parameters for a saved job. --list these are all the arguments you can use with sqoop job command to execute, list,delete jobs ..etc. Use --password-file option to Set path for a file containing the authentication password while creating sqoop jobs

arald · ‎01-09-2018

@Ravikiran Dasari: You can see all parameters from hive with "hive -H". hive -H usage: hive -d,--define <key=value> Variable substitution to apply to Hive commands. e.g. -d A=B or --define A=B -e <quoted-query-string> SQL from command line -f <filename> SQL from files -H,--help Print help information -h <hostname> Connecting to Hive Server on remote host --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable substitution to apply to hive commands. e.g. --hivevar A=B -i <filename> Initialization SQL file -p <port> Connecting to Hive Server on port number -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console) You can add two or more tables into the same schema if they have different names (which will be the case if you use the timestamp). If you are running your create script in parallel, you could always just get a new timestamp in case the tablename with the timestamp you have already exists. If needed you can add the date stamp as well by curr_timestamp=`date +%Y%m%d_%s`

MindGlass · ‎12-12-2017

No. 840GB, that means a single node has almost 120GB RAM, and it's not ideal way to maintain system. Because each nodes need some free memory for other services such os applications or agents which are using by ambari and etc. Just start 90GB to 100GB, then you can slightly change for that.

TimothySpann · ‎12-06-2017

Yes continuously, automatically. By default it polls for new files every 60 seconds, you can shrink that. You can also convert those files to Apache ORC and auto build new Hive tables on them if the files are CSV, TSV, Avro, Excel, JSON, XML, EDI, HL7 or C-CDA. Install Apache NiFi on an edge node, there are ways to combine them with HDP 2.6 and HDF 3 with the new Ambari. But it's easiest to have a separate node for Apache NiFi to start. You can also just download nifi unzip and run on a laptop that has JDK 8 installed https://www.apache.org/dyn/closer.lua?path=/nifi/1.4.0/nifi-1.4.0-bin.zip

farid_gurbanov · ‎12-14-2018

Hi @Jordan Moore, what option would you suggest if you have 100 different sftp sources and 10-15 files in each of them. Configuring individual NiFi processes is not an option here. I've played around with NiFi processors and they are not very good at working with parameters. Would Spark be a good solution for my case? Thanks, Farid

Online	Offline
Last Visited	‎03-29-2019 01:51 PM

Member Since	‎11-21-2017 08:28 AM
Last Visited	‎03-29-2019 01:51 PM
Posts	70
Kudos received	5

Cloudera Community

Re: what is the best way to get ftp file to hdfs c...

Re: Can I install HDF on HDP cluster?

Re: Launcher ERROR, reason: Main class [org.apache...

Re: how to create Oozie view in HDP 2.5.2 cluster?

Re: how to find flume installed directory in hdp c...

Re: How to load data into hive table with existing...

Re: how to create hive table with timestamp as nam...

Re: How to increase Yarn memory?

Re: can we import filles from SFTP to HDFS directl...

Re: how to get files from SFTP to HDFS?