About zoro07500

zoro07500 · ‎03-03-2017

i have 4 csv files , i want to join and merge these files into one files based on a column timestamps to get one file.using spark or hadoop Please any help would be appreciated

zoro07500 · ‎03-02-2017

Hi guys i am trying to save a dataframe to a csv file , that contains a timestamp. The problem that this column changes of format one written in the csv file .when showing via df.show i got a correct format when i check the csv file i got this format i also tried some think like this ,and still got the same problem finalresult.coalesce(1).write.option("header",true).option("inferSchema","true").option("dateFormat","yyyy-MM-dd HH:mm:ss").csv("C:/mydata.csv") val spark =SparkSession.builder.master("local").appName("my-spark-app").getOrCreate()val df = spark.read.option("header",true).option("inferSchema","true").csv("C:/Users/mhattabi/Desktop/dataTest2.csv")//val df = spark.read.option("header",true).option("inferSchema", "true").csv("C:\dataSet.csv\datasetTest.csv")//convert all column to numeric value in order to apply aggregation function df.columns.map { c =>df.withColumn(c, col(c).cast("int"))}//add a new column inluding the new timestamp columnval result2=df.withColumn("new_time",((unix_timestamp(col("time"))/300).cast("long")*300).cast("timestamp")).drop("time")val finalresult=result2.groupBy("new_time").agg(result2.drop("new_time").columns.map((_ ->"mean")).toMap).sort("new_time")//agg(avg(all columns..) finalresult.coalesce(1).write.option("header",true).option("inferSchema","true").csv("C:/mydata.csv")

zoro07500 · ‎02-24-2017

@Bernhard Walter Thanks a lot but can you do it in scala language please it is so kind of you thanks

zoro07500 · ‎02-24-2017

Hi friends I have csv files in local file system , they all have the same header i want to get one csv file with this header , is there a solution using spark-csv or any thing else nwant to loop and merge them any solution please and get a final csv file , using spark Thanks

zoro07500 · ‎02-15-2017

@Michael Young thanks for your replay ,am developping a .Net app, in order to access a remote hdfs cluster here is the code i used when it is locallly i got a correct answer when i put the ip of the remote cluster i got an exception List<string> lstDirectoriesName = new List<string>(); try { WebHDFSClient hdfsClient = new WebHDFSClient(new Uri("http://io-dell-svr8:50070"), "Administrator"); Microsoft.Hadoop.WebHDFS.DirectoryListing directroyStatus = hdfsClient.GetDirectoryStatus("/dataset").Result; List<DirectoryEntry> lst = directroyStatus.Files.ToList(); foreach (DirectoryEntry var in lst) { lstDirectoriesName.Add(var.PathSuffix); } return lstDirectoriesName; } catch (Exception exException) { Console.WriteLine(exException.Message); return null; } Any help whould be appreciated

zoro07500 · ‎02-15-2017

I am using HDP , i would like to use the webhdfs api of hadoop through .Net hadoop webhdfs api, i would like to access the hadoop in a remote machine through the webhdfs , i would like to know the URI that should be done to aceess hdp remotely using http.It should be something like http://host:port/webhdfs, what is the port to use Thank you

zoro07500 · ‎02-14-2017

Hi ,same problem here i want to make these inerface hidden and running in background thanks

zoro07500 · ‎02-14-2017

Hello guys I am running hadoop cluster using HDP , when i run hadoop there are two CLI of the datanode and the namenode , please is there is any possibilty to run them in background ? Thank you a lot !

Online	Offline
Last Visited	‎10-16-2017 01:04 PM

Member Since	‎10-06-2016 08:14 AM
Last Visited	‎10-16-2017 01:04 PM
Posts	40
Kudos received	1

Cloudera Community

merge csv files based on a column timestamp to get...

timestamp column changes of format in a csv file s...

Re: Combine csv files with one header in a csv fi...

Combine csv files with one header in a csv file

Re: Accessing hadoop through webhdfs

Accessing hadoop through webhdfs

Re: Running hadoop in background

Running hadoop in background