Member since
06-07-2016
923
Posts
322
Kudos Received
115
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3993 | 10-18-2017 10:19 PM | |
4255 | 10-18-2017 09:51 PM | |
14631 | 09-21-2017 01:35 PM | |
1773 | 08-04-2017 02:00 PM | |
2357 | 07-31-2017 03:02 PM |
04-10-2017
12:19 PM
@rama Is the following value set to true? keep.failed.task.files (MRv1) or mapreduce.task.files.preserve.failedtasks (MRv2). If yes, that could be the reason staging files are not being deleted. Set this to false and delete the files manually. Do not delete files for currently running job. In rare instances, due to job failure your staging files may not be deleted and you might its remnants here. These are just temporary map reduce files. If no current job is running, you can safely delete these files and reclaim the space. Make sure when you delete these files, they don't end up in trash folder (use -skipTrash option or later delete from trash folder also).
... View more
04-03-2017
05:38 PM
1 Kudo
@Revathy Mourouguessane In your Hive table properties you can specify skip.footer.line.count to remove footer from your data. If you just have one line footer, set this value to 1. You will specify this in your create table properties: tblproperties("skip.header.line.count"="1", "skip.footer.line.count"="1");
... View more
04-03-2017
02:22 PM
@Bala Vignesh N V Then its likely permission issue. Check permissions on .Trash folder and possible Ranger permissions for the user who is running DROP Table.
... View more
04-03-2017
02:05 PM
@Bala Vignesh N V If your table is not a hive managed table (data under hive warehouse directory) or in other words when you create an external table, then dropping a table does not delete data. Data is deleted on drop table only for Hive Managed tables.
... View more
04-01-2017
08:28 PM
@sherri cheng do you mean you have created tables called "drivers, driver1 etc" and now you want to get rid of tables and their associated data? Are the folders created under "/usr/hive/warehouse" directory? Have you used the following? DROP TABLE [IF EXISTS] table_name [PURGE]; --> DROP TABLE IF EXISTS drivers PURGE;
Then run it again for other tables.
... View more
03-31-2017
09:34 PM
@yvora actually if you want to run this as user "a" and not the principal then command changes...you do kinit like you said but then you provide --proxy-user
... View more
03-31-2017
09:08 PM
1 Kudo
@Kevin Ng Can you run a kinit before running the spark command?
... View more
03-31-2017
05:27 PM
@Lucy zhang
Please try the following: --map-column-java isactive=Integer or --map-column-java iactive=String also try the following --map-column-hive isactive=STRING or --map-column-hive iactive=INT
... View more
03-30-2017
09:32 PM
1 Kudo
@kkanchu You are reading defaults for MRV1. With YARN/MRV2 mapreduce.cluster.local.dir has been replaced by yarn.nodemanager.local-dirs This property uses your local disk for storing temporary files. I have not tried mapreduce.cluster.temp.dir but it seems to me the difference is that this is a location in your HDFS and not local file system. You can try running a small sample job and see the difference.
... View more
03-29-2017
05:17 PM
1 Kudo
@sushil nagur I agree with both @Graham Martin and @ccasano. Instead of talking about tools which you already know from above answers, I'll talk about why CIOs prefer Hortonworks for offloading their existing ETL jobs. As Graham mentions, we have partners like Informatica, Talend, Pentaho, Syncsort that you can use to write your ETL jobs in Hadoop. What this gives you is faster time to market which is the same story as previous ETL tools. They save time from writing code and your ETLs manually. Prevents bugs that you may have if you were to write your own code. Under the hood, they use similar technologies like Spark, Map/Reduce and even same fast connectors that Sqoop uses. So why use Hortonworks? Because where is the storage engine where all the processing is happening? Without Hortonworks, in the legacy/existing systems, CIOs are paying significantly higher cost per TB of doing the ETL. Some companies are even doing ELT which means they first load data into their data warehouse and then use the processing power of that system to perform transformation. This takes away very expensive resources from reporting/adhoc queries from business which is what the EDW was purchased for to begin with. When you offload those jobs onto Hadoop, you free up all that capacity from these systems and free up the processing power for reporting and business use. Your per TB cost of doing ETL in Hadoop is fraction of what it is in traditional ETL systems. This is the main motivation of offloading ETL in Hadoop. You perform ETL in Hadoop and then push your final result into your EDW.
... View more