Member since
01-21-2018
58
Posts
4
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3301 | 09-23-2017 03:05 AM | |
1566 | 08-31-2017 08:20 PM | |
6192 | 05-15-2017 06:06 PM |
05-11-2018
04:01 PM
Hello everyone, I have a situation and I would like to count on the community advice and perspective. I'm working with pyspark 2.0 and python 3.6 in an AWS environment with Glue. I need to catch some historical information for many years and then I need to apply a join for a bunch of previous queries. So decide to create a DF for every query so easily I would be able to iterate in the years and months I want to go back and create on the flight the DF's. The problem comes up when I need to apply a join among the DF's created in a loop because I use the same DF name within the loop and if I tried to create a DF name in a loop the name is read as a string not really as a DF then I can not join them later, So far my code looks like: query = 'SELECT * FROM TABLE WHERE MONTH = {}'
months = [1,2]
frame_list = []
for item in months:
df = 'cohort_2013_{}'.format(item)
query = query_text.format(item)
frame_list.append(df) # I pretend to retain in a list the name of DF to recall it later
df = spark.sql(query)
df = DynamicFrame.fromDF( df , glueContext, "df")
applyformat = ApplyMapping.apply(frame = df, mappings =
[("field1","string","field1","string"),
("field2","string","field2","string")],
transformation_ctx = "applyformat")
for df in frame_list:
create a join query for all created DF. Please if someone knows how could I achieve this requirement let me know your ideas. thanks so much
... View more
Labels:
- Labels:
-
Apache Spark
02-25-2018
09:14 PM
Sorry sometime not read completely come up an issue 😞 works seamlessly.!
... View more
02-25-2018
08:50 PM
Hello everyone, My sandbox 2.6 HDP is already running as never before actually is super cool. However, I'm not able to connect with hive thru the beeline because I need to change my loging to maria_dev or any other in the shell command. I tried doing su maria_dev or any other user but it doesnt go thru. Take a look the image attached. Please if anyone can give me an idea about how to log in under maria_dev credentials I appreciate because I need to run Spark SQL and Spark scripts. thanks so much
... View more
Labels:
- Labels:
-
Apache Hive
01-12-2018
06:16 PM
Do you know some tutorial or place documentation about how is the riht way to sert up the required services at least Spark2 and HDFS cux my VM is failing and failing even if I turn off the maintenanence mode and I'm not able to start the services. thanks so much
... View more
01-12-2018
03:23 PM
1 Kudo
Guys thanks so much I'm already in, please I want also include this amazing tutorial with enough documentation to play with our sandbox. https://github.com/hortonworks/data-tutorials/blob/master/tutorials/hdp/learning-the-ropes-of-the-hortonworks-sandbox/tutorial.md. thanks so much @Julián Rodríguez @Edgar Orendain
... View more
01-12-2018
03:16 PM
@Edgar Orendain @Julián Rodríguez Guys I get already the machine up thru my browser and Putty as you specified but where can I find info about how to access it thru ambari and the user and pass I can use by ssh? thanks so much
... View more
01-12-2018
02:06 PM
I'm going to try the suggested workaround , however this is not the first time it happens to me, months ago I was playing with the VM HDP2.4 and was the same. I will keep you posted. My goal truly is play with spark in HDP I dont know if HDF can works in that purpose. thanks
... View more
01-09-2018
10:07 PM
Hello @Jay Kumar SenSharmaI get the same issue please take a look on my current host configuration: OS: Windows 10 VirtualBox 5.4.2 Ram 16gb Hard DIsk 1TB Besides the issue on the image downloaded that says Docker (I re-downloaded with the right MD5). My machine never startup , 30 minutes later and see how it looks.
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
09-23-2017
03:05 AM
Hi Guys, I'm so so .... Well, I just remember that you can create just an external table stored in the same folder all files with the same structure are located. So , in that way I will load whole records in one shoot. > CREATE EXTERNAL TABLE bixi_his > ( > STATIONS ARRAY<STRUCT<id: INT,s:STRING,n:string,st:string,b:string,su:string,m:string,lu:string,lc:string,bk:string,bl:string,la:float,lo:float,da:int,dx:int,ba:int,bx:int>>, > SCHEMESUSPENDED STRING, > TIMELOAD BIGINT > ) > ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' > LOCATION '/user/ingenieroandresangel/datasets/bixi2017/'; thanks
... View more
09-23-2017
01:02 AM
Look I'm trying to analyze too many files into just one HIVE table. Key insights, I'm working with json files and the tables structure is : CREATE EXTERNAL TABLE test1 ( STATIONS ARRAY<STRING>, SCHEMESUSPENDED STRING, TIMELOAD TIMESTAMP ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/user/andres/hive/bixihistorical/'; I need to load around 50 files with the same structure all of them. I have tried things like: LOAD DATA INPATH '/user/andres/datasets/bixi2017/*.json' OVERWRITE INTO TABLE test1; LOAD DATA INPATH '/user/andres/datasets/bixi2017/*' OVERWRITE INTO TABLE test1; LOAD DATA INPATH '/user/andres/datasets/bixi2017/' OVERWRITE INTO TABLE test1; Any of those above have worked, any idea guys about how should I go thru? thanks so much
... View more
Labels:
- Labels:
-
Apache Hive