About AndresUrrego

AndresUrrego · ‎05-11-2018

Hello everyone, I have a situation and I would like to count on the community advice and perspective. I'm working with pyspark 2.0 and python 3.6 in an AWS environment with Glue. I need to catch some historical information for many years and then I need to apply a join for a bunch of previous queries. So decide to create a DF for every query so easily I would be able to iterate in the years and months I want to go back and create on the flight the DF's. The problem comes up when I need to apply a join among the DF's created in a loop because I use the same DF name within the loop and if I tried to create a DF name in a loop the name is read as a string not really as a DF then I can not join them later, So far my code looks like: query = 'SELECT * FROM TABLE WHERE MONTH = {}' months = [1,2] frame_list = [] for item in months: df = 'cohort_2013_{}'.format(item) query = query_text.format(item) frame_list.append(df) # I pretend to retain in a list the name of DF to recall it later df = spark.sql(query) df = DynamicFrame.fromDF( df , glueContext, "df") applyformat = ApplyMapping.apply(frame = df, mappings = [("field1","string","field1","string"), ("field2","string","field2","string")], transformation_ctx = "applyformat") for df in frame_list: create a join query for all created DF. Please if someone knows how could I achieve this requirement let me know your ideas. thanks so much

AndresUrrego · ‎02-25-2018

Sorry sometime not read completely come up an issue 😞 works seamlessly.!

AndresUrrego · ‎02-25-2018

Hello everyone, My sandbox 2.6 HDP is already running as never before actually is super cool. However, I'm not able to connect with hive thru the beeline because I need to change my loging to maria_dev or any other in the shell command. I tried doing su maria_dev or any other user but it doesnt go thru. Take a look the image attached. Please if anyone can give me an idea about how to log in under maria_dev credentials I appreciate because I need to run Spark SQL and Spark scripts. thanks so much

AndresUrrego · ‎01-12-2018

Do you know some tutorial or place documentation about how is the riht way to sert up the required services at least Spark2 and HDFS cux my VM is failing and failing even if I turn off the maintenanence mode and I'm not able to start the services. thanks so much

AndresUrrego · ‎01-12-2018

Guys thanks so much I'm already in, please I want also include this amazing tutorial with enough documentation to play with our sandbox. https://github.com/hortonworks/data-tutorials/blob/master/tutorials/hdp/learning-the-ropes-of-the-hortonworks-sandbox/tutorial.md. thanks so much @Julián Rodríguez @Edgar Orendain

AndresUrrego · ‎01-12-2018

@Edgar Orendain @Julián Rodríguez Guys I get already the machine up thru my browser and Putty as you specified but where can I find info about how to access it thru ambari and the user and pass I can use by ssh? thanks so much

AndresUrrego · ‎01-12-2018

I'm going to try the suggested workaround , however this is not the first time it happens to me, months ago I was playing with the VM HDP2.4 and was the same. I will keep you posted. My goal truly is play with spark in HDP I dont know if HDF can works in that purpose. thanks

AndresUrrego · ‎01-09-2018

Hello @Jay Kumar SenSharmaI get the same issue please take a look on my current host configuration: OS: Windows 10 VirtualBox 5.4.2 Ram 16gb Hard DIsk 1TB Besides the issue on the image downloaded that says Docker (I re-downloaded with the right MD5). My machine never startup , 30 minutes later and see how it looks.

AndresUrrego · ‎09-23-2017

Hi Guys, I'm so so .... Well, I just remember that you can create just an external table stored in the same folder all files with the same structure are located. So , in that way I will load whole records in one shoot. > CREATE EXTERNAL TABLE bixi_his > ( > STATIONS ARRAY<STRUCT<id: INT,s:STRING,n:string,st:string,b:string,su:string,m:string,lu:string,lc:string,bk:string,bl:string,la:float,lo:float,da:int,dx:int,ba:int,bx:int>>, > SCHEMESUSPENDED STRING, > TIMELOAD BIGINT > ) > ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' > LOCATION '/user/ingenieroandresangel/datasets/bixi2017/'; thanks

AndresUrrego · ‎09-23-2017

Look I'm trying to analyze too many files into just one HIVE table. Key insights, I'm working with json files and the tables structure is : CREATE EXTERNAL TABLE test1 ( STATIONS ARRAY<STRING>, SCHEMESUSPENDED STRING, TIMELOAD TIMESTAMP ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/user/andres/hive/bixihistorical/'; I need to load around 50 files with the same structure all of them. I have tried things like: LOAD DATA INPATH '/user/andres/datasets/bixi2017/*.json' OVERWRITE INTO TABLE test1; LOAD DATA INPATH '/user/andres/datasets/bixi2017/*' OVERWRITE INTO TABLE test1; LOAD DATA INPATH '/user/andres/datasets/bixi2017/' OVERWRITE INTO TABLE test1; Any of those above have worked, any idea guys about how should I go thru? thanks so much

Online	Offline
Last Visited	‎01-21-2018 10:09 PM

Member Since	‎01-21-2018 06:37 PM
Last Visited	‎01-21-2018 10:09 PM
Posts	58
Kudos received	4

Cloudera Community

Re: Load several files into HIVE table

Re: Read flume twitter files with HIVE

Re: Import Sqoop as textfile

Pyspark: Create dataframes in a loop and then run...

Re: How to start the beeline connection in Hive HD...

How to start the beeline connection in Hive HDP 2....

Re: Virtualbox Sandbox does not start HDP 2.6

Re: Virtualbox Sandbox does not start HDP 2.6

Re: Virtualbox Sandbox does not start HDP 2.6

Re: Virtualbox Sandbox does not start HDP 2.6

Virtualbox Sandbox does not start HDP 2.6

Re: Load several files into HIVE table

Load several files into HIVE table