Member since
05-09-2016
280
Posts
58
Kudos Received
31
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3745 | 03-28-2018 02:12 PM | |
3022 | 01-09-2018 09:05 PM | |
1649 | 12-13-2016 05:07 AM | |
5033 | 12-12-2016 02:57 AM | |
4311 | 12-08-2016 07:08 PM |
10-13-2016
06:27 PM
Seems like you have downloaded tar.gz file of HDF but have not extracted it. Try running this command after SCP step tar xvf /root/HDF-2.0.0.0-579.tar.gz After this, run the jdk install script. Let us know if you have any further issues.
... View more
10-12-2016
06:30 PM
Thanks a lot @Bernhard Walter , that works like a charm
... View more
10-12-2016
02:37 AM
Hi all, I want to create a dataframe in Spark and assign proper schema to the data. I have multiple files under one HDFS directory and I am reading all files using the following command: %pyspark logs_df = sqlContext.read.text("hdfs://sandbox.hortonworks.com:8020/tmp/nifioutput") This is creating a dataframe and stores everything in a single column. Next, I want to derive multiple columns from this single column. Typing this: %pyspark from pyspark.sql.functions import split, expr logs_df.select(expr("(split(value, '|'))[0]").cast("string").alias("IP"), expr("(split(value, '|'))[1]").cast("string").alias("Time"), expr("(split(value, '|'))[2]").cast("string").alias("Request_Type"), expr("(split(value, '|'))[3]").cast("integer").alias("Response_Code"), expr("(split(value, '|'))[4]").cast("string").alias("City"), expr("(split(value, '|'))[5]").cast("string").alias("Country"), expr("(split(value, '|'))[6]").cast("string").alias("Isocode"), expr("(split(value, '|'))[7]").cast("double").alias("Latitude"), expr("(split(value, '|'))[8]").cast("double").alias("Longitude")).show() gives me a strange result. It takes only 1 character from the row instead of using the delimiter (i.e. I) and stores it in different columns. +---+----+------------+-------------+----+-------+-------+--------+---------+ | IP|Time|Request_Type|Response_Code|City|Country|Isocode|Latitude|Longitude| +---+----+------------+-------------+----+-------+-------+--------+---------+ | 1| 3| 3| null| 6| 8| .| 1.0| 8.0| | 1| 3| 3| null| 6| 8| .| 1.0| 8.0| | 1| 3| 3| null| 6| 8| .| 1.0| 8.0| As you can see here, each column is taking only 1 character, 133.68.18.180 should be an IP address only. Is this the right way to create multiple columns out of one? Please help. PS - Want to avoid regexp_extract in this.
... View more
Labels:
- Labels:
-
Apache Spark
10-10-2016
09:16 PM
Thank you so much @Andy LoPresto, it worked. It was capturing nothing earlier, perhaps because of other 3 digit numbers. The log format is consistent throughout the file, so yeah, the workflow flowed like a water 🙂
... View more
10-10-2016
05:00 PM
1 Kudo
Hi all, I am using Nifi to extract attributes like IP, timestamp, request type, and status code from the web server logs. This is the sample of my data: 133.43.96.45 - - [01/Aug/1995:00:00:16 -0400] "GET /shuttle/missions/sts-69/mission-sts-69.html HTTP/1.0" 200 10566 Using regex in ExtractText Processor to do this operation. I am getting IP, timestamp and request type but not able to extract status code which is 200 in this case. Using (\\d{3}) right now but it is not working. Has anyone tried out this before?
... View more
Labels:
- Labels:
-
Apache NiFi
10-05-2016
02:43 PM
Thanks alot @Andrew Ryan, I moved all the files from the topologies folder, and then restarted Knox, it created default, admin and knoxsso xml files. Tested default and it worked fine. Then I moved back the old knox_sample.xml to the folder and restarted Knox, it worked. Still wondering what could be the issue
... View more
10-04-2016
06:43 PM
1 Kudo
Please make sure that your postgres is running while you stop/start the ambari server. Further, provide full ambari server log file so that it becomes easy to figure it out.
... View more
10-03-2016
06:03 PM
1 Kudo
There is a typo in the code, please change from %jbdc(hive) to %jdbc(hive). Further, for riskfactor table issue can you please drop the riskfactor table first and then recreate it using CTAS. Run the following commands: %hive drop table riskfactor In another paragraph, %spark hiveContext.sql("create table riskfactor as select * from finalresults")
... View more
09-30-2016
06:06 PM
1 Kudo
Hi @Cindy Liu, can you please replace %jdbc with %jdbc(hive) and then run the qery, it should work
... View more
09-30-2016
04:43 PM
Hi @Andrew Ryan, Yes, ranger is enabled with knox. Please find the attached gateway.log file. I also went through gateway-audit.log file which shows this: 16/09/30 01:13:22 |||audit|||||redeploy|topology|knoxsso|unavailable| 16/09/30 01:13:23 |||audit|||||redeploy|topology|admin|unavailable| 16/09/30 01:13:23 |||audit|||||redeploy|topology|default|unavailable| 16/09/30 01:13:23 |||audit|||||redeploy|topology|knox_sample|unavailable|
... View more