About mrizvi

mrizvi · ‎11-11-2016

it would be the great help if someone replies to this thread, kind of stuck here. Thanks

mrizvi · ‎11-10-2016

Never mind, got this working. The problem was the double quote so I changed the replace string to ${twitter.msg:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('"',''):replace('\n','')} and it worked.

mrizvi · ‎11-10-2016

Hi Guys, I am trying to get the Twitter data from Nifi to Hive in JSON format using HiveJson Serde(referring this https://github.com/rcongiu/Hive-JSON-Serde). I am using ReplaceText processor to parse the JSON data, one set of records per line which is the condition for this HiveJSON serde. My parsing string in the processor is this: {"tweet_id":${twitter.tweet_id},"created_unixtime":${twitter.unixtime},"created_time":"${twitter.time}","lang":"${twitter.language}","displayname":"${twitter.handle}","time_zone":"${twitter.time_zone}","msg":"${twitter.msg:replace(',''):replace('\n','')}"} Sample of data in HDFS looks like this: {"tweet_id":796782698631233536,"created_unixtime":1478802773517,"created_time":"Thu Nov 10 18:32:53 +0000 2016","lang":"it","displayname":"robinit66","time_zone":"Rome","msg":"RT @ubaldo_angelo: spiace davvero per tutti quelli che avevano intravisto nelle carotine e nei cavoletti di MIchelle le basi di una nuova c???"} {"tweet_id":796782702829666304,"created_unixtime":1478802774518,"created_time":"Thu Nov 10 18:32:54 +0000 2016","lang":"it","displayname":"vespro4","time_zone":"Pacific Time (US & Canada)","msg":"Comprendo l'evasore fiscale https://t.co/Jn0s9KZEUz"} Then I ran SELECT * from the table which is the external table in Hive, I got this error: {"trace":"org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Expected a \u0027,\u0027 or \u0027}\u0027 at 273 [character 274 line 1]\n\norg.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Expected a \u0027,\u0027 or \u0027}\u0027 at 273 [character 274 line 1]\n\tat org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)\n\tat org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)\n\tat org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:373)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.getNext(ResultSetIterator.java:119)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.handleMessage(ResultSetIterator.java:79)\n\tat I thought of replacing all special characters using replaceAll of Nifi expression language so I changed the replace string for the message as ${twitter.msg:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('\n','')}. The data in HDFS looked much refined but that error was still there. Then I hard coded the msg string with a bunch of special characters and text and it worked, please see this sample-data-worked.png. Any help would be greatly appreciated.

mrizvi · ‎11-09-2016

@Gunesh P , you have to create a user in OS, login to Sandbox terminal as root and type useradd abc passwd abc This user will not have sudo access, in order to give it, type visudo and add this line under root: ## Allow root to run any commands anywhere root ALL=(ALL) ALL abc ALL=(ALL) ALL Save the file and after this, you can do SSH and later on, you can give this user read/write access through HDFS policies of Ranger.

mrizvi · ‎11-08-2016

HI Experts, I am using Spark 2.0.0 and I have an airline dataset. I created a SparkR dataframe and able to run some of the functions of SparkR Dataframe API. But, I am running through some exceptions while building the linear model using Gaussian family. Here is my command: model <- glm(train_data, ARR_DELAY ~ MONTH + DEP_HOUR + DEP_DELAY + WEEKEND + DISTANCE, family = "gaussian") ERROR: Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.sql.AnalysisException: Cannot resolve column name "formula" among (YEAR, MONTH, DAY_OF_MONTH, DAY_OF_WEEK, CARRIER, FL_NUM, ORIGIN, DEST, DEP_TIME, DEP_DELAY, ARR_TIME, ARR_DELAY, CANCELLED, CANCELLATION_CODE, AIR_TIME, DISTANCE, WEEKEND, DEP_HOUR, DELAY_LABELED); For some reason, it tries to fetch formula column, so I replaced above command with: model <- glm(train_data, formula = ARR_DELAY ~ MONTH + DEP_HOUR + DEP_DELAY + WEEKEND + DISTANCE, family = "gaussian") This time, I got this error: ERROR Executor: Exception in task 0.0 in stage 45.0 (TID 95) scala.MatchError: [null,1.0,[1.0,11.0,5.0,1.0,2475.0]] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) Has anyone seen such kind of behaviour? Thanks in advance

mrizvi · ‎11-08-2016

@Manoj Dhake , it depends on the dependent variable. The unit of RMSE is same as dependent variable. If your data has a range of 0 to 100000 then RMSE value of 3000 is small, but if the range goes from 0 to 1, it is pretty huge. Try to play with other input variables, and compare your RMSE values. The smaller the RMSE value, the better the model. Also, try to compare your RMSE values of both training and testing data. If they are almost similar, your model is good. If the RMSE for the testing data is much higher than that of the training data, it is likely that you've badly over fit the data.

mrizvi · ‎11-03-2016

@Hitesh Rajpurohit , make sure that the Ranger Tagsync is up and running. Go to Ambari and then click on Ranger from the list of services, click on Tagsync and start the component.

mrizvi · ‎11-02-2016

@azeltov , @Edgar Daeds When we run anything from Livy server, it tries to connect to Resource manager at 8032 port. But in Sandbox, the value of yarn.resourcemanager.address is 8050 so it waits for 10 tries and then fails. Go to Ambari->Yarn->Configs. Search for yarn.resourcemanager.address and change it to 8032. Restart YARN and then try running %livy.pyspark.

mrizvi · ‎11-01-2016

I am trying to install R-studio on Hortonworks sandbox 2.5, downloaded this rpm: rstudio-server-rhel-0.99.893-x86_64.rpm After installing this package, when I am running rstudio-server verify-installation, it shows me the following message: initctl: Unable to connect to Upstart: Failed to connect to socket /com/ubuntu/upstart: Connection refused I have tried starting, stopping rstudio server, it shows the same message. Has anyone experienced this before? Please help. PS: Since it is a docker container, 8787 port is not opened so I have configured /etc/rstudio/rserver.conf to use port 8090.

mrizvi · ‎11-01-2016

@Volodymyr Ostapiv , the Sandbox release has been updated, please download it again, you should not face this issue now. http://hortonworks.com/downloads/

Online	Offline
Last Visited	‎08-14-2019 08:03 PM

Member Since	‎05-09-2016 01:14 AM
Last Visited	‎08-14-2019 08:03 PM
Posts	280
Kudos received	58

Cloudera Community

Re: Hive database/table monitoring

Re: Exception while using Spark HBase Connector on...

Re: Like Example.jar is there any sample pig scrip...

Re: I am trying to use sandbox with virtual machin...

Re: Pig on Hortonworks Sandbox In Azure

Re: Error in building the Generalized Linear Model...

Re: Not able to fetch Twitter JSON data coming fro...

Not able to fetch Twitter JSON data coming from Ni...

Re: Create a user for HDP Sandbox

Error in building the Generalized Linear Model in ...

Re: How to reduce RMSE(Root Mean Squred Error) val...

Re: Atlas-Ranger issue

Re: LivyServer exception

RStudio installation on Hortonworks Sandbox

Re: HDP 2.5 Sandbox can't install a service