About AlinaGHERMAN

AlinaGHERMAN · ‎11-05-2015

Hello, Thank you for your answer. The problem is that clasic Pig scripts (no access to Hive tables, nor to HBase) are running in a distributed way (they have mappers and reducers). However, this one is running only on one node (in Cloudera Manager ->Hosts all namenodes have a Load Average of 0.* and one node has 9.* as load charge) Since you say that normally, even if only mappers are created the script should run in a distributed node, I will post an anonymised version of my script. SET mapreduce.fileoutputcommitter.marksuccessfuljobs false; SET output.compression.codec org.apache.hadoop.io.compress.SnappyCodec; SET hbase.zookeeper.quorum '${ZOOKEEPER_QUORUM}'; SET oozie.use.system.libpath true SET oozie.libpath '${PATH_LIB_OOZIE}' ------------------------------------------------------------ -- hcat register 'hive-hcatalog-core-0.13.1-cdh5.3.0.jar'; register 'hive-hcatalog-core.jar'; register 'hive-hcatalog-pig-adapter-0.13.1-cdh5.3.0.jar'; register 'hive-hcatalog-pig-adapter.jar'; register 'hive-metastore-0.13.1-cdh5.3.0.jar'; register 'datanucleus-core-3.2.10.jar'; register 'datanucleus-api-jdo-3.2.6.jar'; register 'datanucleus-rdbms-3.2.9.jar'; register 'commons-dbcp-1.4.jar'; register 'commons-pool-1.5.4.jar'; register 'jdo-api-3.0.1.jar'; -- UDF REGISTER 'MyStoreUDF-0.3.8.jar'; ------------------------------------------------------------------------------------------------------------ ----------------------------------------------- input data ------------------------------------------------- var_a= LOAD 'my_database.my_table' USING org.apache.hcatalog.pig.HCatLoader() as ( a:chararray , b:chararray, c:chararray, d:chararray, e:chararray, f:long, g:chararray, h:chararray, i:long, j:chararray, k:bag{((name:chararray,value:chararray))}, l:chararray, m:chararray ); var_a_filtered= FILTER sessions BY (a== 'abcd' ); var_a_proj= FOREACH var_a_filteredGENERATE a, b, c, d; STORE var_a_proj INTO 'hbaseTableName' USING MyStoreUDF('-hbaseTableName1 hbaseTableName1 -hbaseTableName2 -hbaseTableName2 '); Thank you! Alina GHERMAN

AlinaGHERMAN · ‎11-05-2015

Hello, In http://<IP>/oozie/list_oozie_coordinators/ the Next Submission field is never updated. In fact it always contains the first submission of the job. Thank you! Alina GHERMAN

AlinaGHERMAN · ‎11-04-2015

Hello, I have a pig job that I schedule with oozie. This pig job is reading data from a Hive table and is writing into 3 HBase tables (UDF). The problem is that only one node is working. I notice that this job has only mappers and no reducers. Is this the problem? I'm asking this because of the thread: https://community.cloudera.com/t5/Batch-Processing-and-Workflow/Execute-Shell-script-through-oozie-job-in-all-node/m-p/33136#M1765 where @Sue said "The Oozie shell action is run as a Hadoop job with one map task and zero reduce tasks - the job runs on one arbitrary node in the cluster." Is there a way to force the cluster to use all the nodes? Thank you!

AlinaGHERMAN · ‎10-10-2015

Hello, I created my account with my client email and I would like to add my company email and my personal one. Is this possible? Thank you,

AlinaGHERMAN · ‎10-02-2015

Effectivly, TRUNCATE doesn't apear in http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_shell_commands.html

AlinaGHERMAN · ‎10-01-2015

Hello, For me it didn't work without the python eggs thing. In order to by dynamic I replaced some well defined sequences in the shell script. For example. I had my impala script something like this one: select * from ${table1}; select * from ${table2}; Then in the shell script I did something like: sed "s/\${table1}/my_real_table_one/g;s/\${table2}/my_real_table_two/g;" $LOCAL_FILE_PATH > $LOCAL_FILE_WITH_VARIABLE_REPLACED I hope this and my other post with the copyToLocal command for copying in local will help you. Alina GHERMAN

AlinaGHERMAN · ‎09-30-2015

Hello, I need to use the TRUNCATE command, however when I the folowing command TRUNCATE TABLE table_name PARTITION (datebymonth='2014-10') in impala-shell I get an error: ERROR: AnalysisException: Syntax error in line 6: TRUNCATE TABLE user_events ... ^ Encountered: IDENTIFIER Expected: ALTER, COMPUTE, CREATE, DESCRIBE, DROP, EXPLAIN, GRANT, INSERT, INVALIDATE, LOAD, REFRESH, REVOKE, SELECT, SET, SHOW, USE, VALUES, WITH In hue impala editor everything is fine. Note: I need to use the impala-shell because I add these queries in a cron job. The impala version that I use: v2.1.4-cdh5 Cloudera version: CDH 5.3 Thank you!

AlinaGHERMAN · ‎09-29-2015

The problem was solved by changing the source from spooldir to http. I think there is a problem with the spooldir source.

AlinaGHERMAN · ‎09-09-2015

I wanted to add one more information: - in Cloudera Manager ==> charts ==> if we do "select channel_fill_percentage_across_flume_channels", we are at maximum 0.0001%... Note: we have 2 channels, each with one sink and one source, both on the same machine. This means that the error/warning that we have in the logs is not the real point that is blocking flume to work... Thank you!

AlinaGHERMAN · ‎09-08-2015

Hello, Thank you. There are no errors delivered to hdfs.. Note: - The interceptor is only normalizing some inputs.. - I tried to add the thread number configuration to the sink, but with no succes (there was no difference). # source definition projectName.sources.spooldir-source.type = spooldir projectName.sources.spooldir-source.spoolDir = /var/flume/in projectName.sources.spooldir-source.basenameHeader = true projectName.sources.spooldir-source.basenameHeaderKey = basename projectName.sources.spooldir-source.batchSize = 10 projectName.sources.spooldir-source.deletePolicy = immediate # Max blob size: 1.5Go projectName.sources.spooldir-source.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder projectName.sources.spooldir-source.deserializer.maxBlobLength = 1610000000 # Attach the interceptor to the source projectName.sources.spooldir-source.interceptors = json-interceptor projectName.sources.spooldir-source.interceptors.json-interceptor.type = com.company.analytics.flume.interceptor.JsonInterceptor$Builder # Define event's headers. basenameHeader must be the same than source.basenameHeaderKey (defaults is basename) projectName.sources.spooldir-source.interceptors.json-interceptor.basenameHeader = basename projectName.sources.spooldir-source.interceptors.json-interceptor.resourceHeader = resources projectName.sources.spooldir-source.interceptors.json-interceptor.ssidHeader = ssid # channel definition projectName.channels.mem-channel-1.type = memory projectName.channels.mem-channel-1.capacity = 100000 projectName.channels.mem-channel-1.transactionCapacity = 1000 # sink definition projectName.sinks.hdfs-sink-1.type = hdfs projectName.sinks.hdfs-sink-1.hdfs.path = hdfs://StandbyNameNode/path/to/in projectName.sinks.hdfs-sink-1.hdfs.filePrefix = %{resources}_%{ssid} projectName.sinks.hdfs-sink-1.hdfs.fileSuffix = .json projectName.sinks.hdfs-sink-1.hdfs.fileType = DataStream projectName.sinks.hdfs-sink-1.hdfs.writeFormat = Text projectName.sinks.hdfs-sink-1.hdfs.rollInterval = 3600 projectName.sinks.hdfs-sink-1.hdfs.rollSize = 63000000 projectName.sinks.hdfs-sink-1.hdfs.rollCount = 0 projectName.sinks.hdfs-sink-1.hdfs.batchSize = 1000 projectName.sinks.hdfs-sink-1.hdfs.idleTimeout = 60 # connect source and sink to channel projectName.sources.spooldir-source.channels = mem-channel-1 projectName.sinks.hdfs-sink-1.channel = mem-channel-1 Would it help to add different identic sinks on the same machine? Thank you! Alina GHERMAN

Online	Offline
Last Visited	‎09-04-2017 03:49 AM

Member Since	‎11-18-2014 07:55 AM
Last Visited	‎09-04-2017 03:49 AM
Posts	196
Kudos received	16

Cloudera Community

Re: HBase - alter table - add pre-splits

Re: Execute a pig job on all nodes

Re: HBase regions are moved on the same machine

Re: Flume - Memory Channel Full

Re: Pig 0.12 + CDH 5.3 Compatibility

Re: Execute a pig job on all nodes

Oozie coordinator editor - problems with using dat...

Execute a pig job on all nodes

Add multiple emails to my profile

Re: Impala-Shell vs Impala in Hue

Re: Impala schedule with oozie -tutorial

Impala-Shell vs Impala in Hue

Re: Flume - Memory Channel Full

Re: Flume - Memory Channel Full

Re: Flume - Memory Channel Full