About mrizvi

mrizvi · ‎10-13-2016

Seems like you have downloaded tar.gz file of HDF but have not extracted it. Try running this command after SCP step tar xvf /root/HDF-2.0.0.0-579.tar.gz After this, run the jdk install script. Let us know if you have any further issues.

mrizvi · ‎10-12-2016

Thanks a lot @Bernhard Walter , that works like a charm

mrizvi · ‎10-12-2016

Hi all, I want to create a dataframe in Spark and assign proper schema to the data. I have multiple files under one HDFS directory and I am reading all files using the following command: %pyspark logs_df = sqlContext.read.text("hdfs://sandbox.hortonworks.com:8020/tmp/nifioutput") This is creating a dataframe and stores everything in a single column. Next, I want to derive multiple columns from this single column. Typing this: %pyspark from pyspark.sql.functions import split, expr logs_df.select(expr("(split(value, '|'))[0]").cast("string").alias("IP"), expr("(split(value, '|'))[1]").cast("string").alias("Time"), expr("(split(value, '|'))[2]").cast("string").alias("Request_Type"), expr("(split(value, '|'))[3]").cast("integer").alias("Response_Code"), expr("(split(value, '|'))[4]").cast("string").alias("City"), expr("(split(value, '|'))[5]").cast("string").alias("Country"), expr("(split(value, '|'))[6]").cast("string").alias("Isocode"), expr("(split(value, '|'))[7]").cast("double").alias("Latitude"), expr("(split(value, '|'))[8]").cast("double").alias("Longitude")).show() gives me a strange result. It takes only 1 character from the row instead of using the delimiter (i.e. I) and stores it in different columns. +---+----+------------+-------------+----+-------+-------+--------+---------+ | IP|Time|Request_Type|Response_Code|City|Country|Isocode|Latitude|Longitude| +---+----+------------+-------------+----+-------+-------+--------+---------+ | 1| 3| 3| null| 6| 8| .| 1.0| 8.0| | 1| 3| 3| null| 6| 8| .| 1.0| 8.0| | 1| 3| 3| null| 6| 8| .| 1.0| 8.0| As you can see here, each column is taking only 1 character, 133.68.18.180 should be an IP address only. Is this the right way to create multiple columns out of one? Please help. PS - Want to avoid regexp_extract in this.

mrizvi · ‎10-10-2016

Thank you so much @Andy LoPresto, it worked. It was capturing nothing earlier, perhaps because of other 3 digit numbers. The log format is consistent throughout the file, so yeah, the workflow flowed like a water 🙂

mrizvi · ‎10-10-2016

Hi all, I am using Nifi to extract attributes like IP, timestamp, request type, and status code from the web server logs. This is the sample of my data: 133.43.96.45 - - [01/Aug/1995:00:00:16 -0400] "GET /shuttle/missions/sts-69/mission-sts-69.html HTTP/1.0" 200 10566 Using regex in ExtractText Processor to do this operation. I am getting IP, timestamp and request type but not able to extract status code which is 200 in this case. Using (\\d{3}) right now but it is not working. Has anyone tried out this before?

mrizvi · ‎10-05-2016

Thanks alot @Andrew Ryan, I moved all the files from the topologies folder, and then restarted Knox, it created default, admin and knoxsso xml files. Tested default and it worked fine. Then I moved back the old knox_sample.xml to the folder and restarted Knox, it worked. Still wondering what could be the issue

mrizvi · ‎10-04-2016

Please make sure that your postgres is running while you stop/start the ambari server. Further, provide full ambari server log file so that it becomes easy to figure it out.

mrizvi · ‎10-03-2016

There is a typo in the code, please change from %jbdc(hive) to %jdbc(hive). Further, for riskfactor table issue can you please drop the riskfactor table first and then recreate it using CTAS. Run the following commands: %hive drop table riskfactor In another paragraph, %spark hiveContext.sql("create table riskfactor as select * from finalresults")

mrizvi · ‎09-30-2016

Hi @Cindy Liu, can you please replace %jdbc with %jdbc(hive) and then run the qery, it should work

mrizvi · ‎09-30-2016

Hi @Andrew Ryan, Yes, ranger is enabled with knox. Please find the attached gateway.log file. I also went through gateway-audit.log file which shows this: 16/09/30 01:13:22 |||audit|||||redeploy|topology|knoxsso|unavailable| 16/09/30 01:13:23 |||audit|||||redeploy|topology|admin|unavailable| 16/09/30 01:13:23 |||audit|||||redeploy|topology|default|unavailable| 16/09/30 01:13:23 |||audit|||||redeploy|topology|knox_sample|unavailable|

Online	Offline
Last Visited	‎08-14-2019 08:03 PM

Member Since	‎05-09-2016 01:14 AM
Last Visited	‎08-14-2019 08:03 PM
Posts	280
Kudos received	58

Cloudera Community

Re: Hive database/table monitoring

Re: Exception while using Spark HBase Connector on...

Re: Like Example.jar is there any sample pig scrip...

Re: I am trying to use sandbox with virtual machin...

Re: Pig on Hortonworks Sandbox In Azure

Re: Issues in installing NIFI on sandbox

Re: Not able to split the column into multiple col...

Not able to split the column into multiple columns...

Re: Unable to extract status code from Web Server ...

Unable to extract status code from Web Server Logs...

Re: Not able to access WebHDFS via Knox in the San...

Re: Ambari Web UI showing loading...

Re: %jdbc(hive) prefix not found in Zeppelin

Re: Error as below occurred while running Tutorial...

Re: Not able to access WebHDFS via Knox in the San...