About egarelnabi

egarelnabi · ‎05-25-2017

The former. You are expected to write the code to run in the Spark shell. Also, take a look at this link with other good questions/answers regarding the exam: https://community.hortonworks.com/questions/70180/hdpcd-spark-exam.html

egarelnabi · ‎05-25-2017

Response(s) provided here: https://community.hortonworks.com/questions/104805/hdp-sandbox-26-hivehive-2-select-count-or-select-d-2.html

egarelnabi · ‎05-25-2017

Response(s) provided here: https://community.hortonworks.com/questions/104805/hdp-sandbox-26-hivehive-2-select-count-or-select-d-2.html

egarelnabi · ‎05-25-2017

@Lei Yin I faced a similar issue in the past and solved it by adjusting the Hive settings in the sandbox as below: 1) In Ambari, select "Hive" from the left menu then the "Configs" tab and the "Settings" sub-tab 2) Scroll down to the bottom of the page and modify the "HeveServer2 Heap Size", "Metastore Heap Size" as well as any other flagged items (possibly "Memory for Map Join"). If you hover next to each item, Ambari will make recommendations for the values to set, so feel free to use those by selecting the "set recommended" icon that appears. 3) Save and click "restart affected" services near the top of the page. Try the above and let us know if it works or not. As always, if you find any post here helpful, don't forget to "accept" an answer.

egarelnabi · ‎05-25-2017

Please take a look at @Matt Clarke's response above on how to extract csv files only. It is the most straight forward way.

egarelnabi · ‎05-24-2017

You can't. Sqoop can only be used to import from RDBMS to HDFS (and vice versa). It does not work with other file system interfaces.

egarelnabi · ‎05-24-2017

@Andres Urrego Neither. Just use the "--warehouse-dir" flag with "import-all-tables". The directory you specify does not need to be a Hive warehouse directory. It can be anything and anywhere you specify in HDFS. The reason you're unable to use "--target-dir" is because that option is only available when all the imported data is to be placed in the one particular folder, whereas "import-all-tables" needs to create subfolders for each table. The "--warehouse-dir" flag only indicates the parent folder where you want all the data to go, and "import-all-tables" would be able to create subdirectories for each table brought in. I've assumed with the above that you want to import all tables. However, if you only want to import a few tables then your best bet is to write a (shell/perl/python/etc...) script that runs multiple Sqoop commands, with each one importing a table. Does that clarify things?

egarelnabi · ‎05-24-2017

@Andres Urrego "import-all-tables" does not support "--target-dir". As you've discovered, "--warehouse-dir" should be used instead. Data for each table will be put in a subfolder in the designated warehouse-dir path. As always, if you find this post helpful, don't forget to "accept" answer.

egarelnabi · ‎05-24-2017

@Tinkle Mahendru Take a look at the example Nifi workflow template in the link below (SplitRouteMerge.xml): https://cwiki.apache.org/confluence/download/attachments/57904847/SplitRouteMerge.xml?version=1&modificationDate=1441745127000&api=v2 This flow demonstrates splitting a file(s) on line boundaries, routing the splits based on a regex in the content, and then merging the files together for storage somewhere. It will give you a good idea on how to process and merge your files.

egarelnabi · ‎05-24-2017

@Narasimma varman Use the PutSQL and ExecuteSQL processors. You can read more about them and their usage at the below link. https://nifi.apache.org/docs.html You can configure them as below: Also, to get a better idea of how to chain the processors, take a look at the below article for an example flow that ingests data into a relational database using Nifi. https://www.batchiq.com/database-ingest-with-nifi.html As always, if you find this post helpful, don't forget to "accept" answer

Online	Offline
Last Visited	‎08-14-2019 09:54 AM

Member Since	‎10-06-2015 09:21 PM
Last Visited	‎08-14-2019 09:54 AM
Posts	273
Kudos received	202

Cloudera Community

Re: Is it possible to import a complete new taxono...

Re: Is it possible in Apache Atlas to add key-valu...

Re: Do we have tag carry forward in atlas hdp2.6.1...

Re: With ATLAS, which format attribute Date is acc...

Re: Spark streaming support for stream analytics m...

Re: HDPCD Spark Exam Coding

Re: HDP Sandbox 2.6 Hive/Hive 2 'SELECT COUNT' or ...

Re: HDP Sandbox 2.6 Hive/Hive 2 'SELECT COUNT' or ...

Re: HDP Sandbox 2.6 Hive/Hive 2 'SELECT COUNT' or ...

Re: combine two files in nifi

Re: Error importing data from MySQL to HDFS

Re: Error importing data from MySQL to HDFS

Re: Error importing data from MySQL to HDFS

Re: combine two files in nifi

Re: how to connect postgresql with nifi?