Member since
10-24-2017
101
Posts
14
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
929 | 07-26-2017 09:57 PM | |
871 | 12-13-2016 12:08 PM | |
214 | 07-28-2016 08:41 PM | |
1440 | 06-15-2016 07:57 AM |
12-14-2018
08:39 AM
1 Kudo
How can I setup the list file processor so that it excludes certain folders? For example if we have the following director /root/A /root/B /root/C /root/D How can i exclude folders C and B in the path filter? Thanks, Ahmad
... View more
Labels:
05-25-2018
10:38 AM
It's still not working unfortunately
... View more
05-17-2018
07:27 AM
@Kiran Nittala I have thousands of flowfiles in queues in other processors. Will i lose them if i restart nifi? Thanks Ahmad
... View more
05-16-2018
09:10 AM
I am getting this error with the FetchFile or GetFile processors 2018-05-16 12:04:15,133 WARN [Timer-Driven Process Thread-7] o.a.n.controller.tasks.ConnectableTask Administratively Yielding ListFile[id=6804e073-0163-1000-56d7-1d884aef90a1] due to uncaught Exception: java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /mnt/win/ArchivesCD/08103DOC.CFL/P1-Especifica????es.pdf
java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /mnt/win/ArchivesCD/08103DOC.CFL/P1-Especifica????es.pdf How can i resolve this? And does the processor abandon the entire process or skip the "malformed" file? My nifi version is 1.6.0 Thanks Ahmad
... View more
Labels:
10-24-2017
11:38 PM
Does Sentry allow us to override ACLs of files in hdfs similar to the way apache ranger does? Can we manage permissions as to who can access hdfs directories and files using Sentry? Thanks Ahmad
... View more
07-26-2017
09:57 PM
this is the code i came up with, is there a better approach? val ds = filteredDF.as[(Integer, String, String, String, String, Double, Integer)]
var df = ds.flatMap {
case (x1, x2, x3, x4, x5, x6, x7) => x3.split(",").map((x1, x2, _, x4, x5, x6, x7))
}.toDF
... View more
07-26-2017
06:33 PM
I am working with scala and i have a dataframe with one of its columns containing several values delimited by a comma. How can i turn these rows ["1", "x,y,z,", "A"] ["2", "x,y", "B"] into ["1", "x,", "A"] ["1", "y,", "A"] ["1", "z", "A"] ["2", "x", "B"] ["2", "y", "B"]
... View more
Labels:
07-04-2017
10:07 AM
I am using the PutHiveStreaming processor and i am getting the following error when i run it. My cluster is kerberized and i have specified the hive principle and keytab. 2017-07-04 13:03:34,211 ERROR [put-hive-streaming-0] o.apache.thrift.transport.TSaslTransport SASL negotiation failure
javax.security.sasl.SaslException: No common protection layer between client and server What could the problem be?
... View more
Labels:
05-05-2017
03:00 PM
I am getting this error when i try to save a dataframe into a file 17/05/05 17:19:20 ERROR DefaultWriterContainer: Job job_201705051719_0000 aborted.
Traceback (most recent call last):
File "/opt/sqlscrapper.py", line 24, in <module>
df.write.format("orc").save("/tmp/orc_query_output")
File "/usr/hdp/2.5.3.0-37/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 397, in save
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/usr/hdp/2.5.3.0-37/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o51.save.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:154)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:106)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, db-hdp-dn2.darbeirut.com): java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:38)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:53)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:347)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
import os
from pyspark import SparkConf,SparkContext
from pyspark.sql import HiveContext
import pandas as pd
conf = (SparkConf()
.setAppName("data_import")
.set("spark.dynamicAllocation.enabled","true")
.set("spark.shuffle.service.enabled","true"))
sc = SparkContext(conf = conf)
sqlctx = HiveContext(sc)
df = sqlctx.load(
source="jdbc",
url="jdbc:sqlserver://db-sqltech:1433;database=WebUsage;user=username;password=password",
dbtable="EmployeeMobiles",
properties={"driver": 'com.sqlserver.jdbc.Driver'})
df.write.format("orc").save("/tmp/orc_query_output")
df.write.mode('overwrite').format('orc').saveAsTable("WebLog")
... View more
Labels:
05-05-2017
02:28 PM
Hey i am getting the following error 17/05/05 17:19:20 ERROR DefaultWriterContainer: Job job_201705051719_0000 aborted.
Traceback (most recent call last):
File "/opt/sqlscrapper.py", line 24, in <module>
df.write.format("orc").save("/tmp/orc_query_output")
File "/usr/hdp/2.5.3.0-37/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 397, in save
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/usr/hdp/2.5.3.0-37/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
File "/usr/hdp/2.5.3.0-37/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o51.save.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:154)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:106)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:106)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, db-hdp-dn2.darbeirut.com): java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:38)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:53)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:347)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1421)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1420)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1420)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801)
... View more
04-06-2017
01:36 PM
i am trying the listhdfs processor, for some reason it is only retrieving around 5000 files
... View more
04-06-2017
08:12 AM
Hello Everytime i face an error in my nifi workflow, the gethdfs processor recrawls the hdfs directory right from the beginning. I want to keep the files where they are in hdfs (keep source file = true) How can i have the gethdfs processor continue from where it stopped? Thanks
... View more
Labels:
03-24-2017
07:35 AM
But the problem is i am referring to the same server (localhost in your example). My solrcloud is composed of 5 servers, how can i make sure i am querying from the live nodes in case some of them are down?
... View more
03-23-2017
01:05 PM
Hello I have a solrcloud of five servers. How should the client applications query solr while ensuring load balancing?
For example if the client application is explicitly querying node A and this node gets disconnected, the client application will stop working.
Is there a master node i can point to that would distribute queries across the active nodes?
Thanks
Ahmad
... View more
Labels:
03-23-2017
09:32 AM
2 Kudos
Hello I have multiple solrclouds distributed accross various regions and cities. Is there a way to join them under a single alias or have them unified? They have the same collection and the same schema. As per the below, the Solr cross data center replication feature is not available when the solr index is stored in HDFS. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462 Thanks Ahmad
... View more
Labels:
03-07-2017
06:53 AM
i have tried sending documents using Solr's rest api and i got the exact same error. The problem isn't with zip files. If a zip file contains pdf or word documents for example the zip is indexed well. However if the zip file contains an mdb file solr fails to index it. Is it possible to have solr ignore only the unsupported extensions rather than ignoring the entire document or file?
... View more
03-07-2017
06:51 AM
I'm using the PutSolrContentStream Processor. Solr is only failes on certain extension type (mdb for example). When an email or a zip file contains an mdb file, the entire document fails to get pushed to solr. Is there a way to have solr index the email or zip file and ignore only the unsupported extensions rather than ignoring the entire document?
... View more
03-06-2017
10:28 AM
Hello I am using the /update/extract request handler to push documents into solr. I am getting this error with certain types of documents. These documents are ended up being ignored by Solr. I have discovered that these files are Emails (.msg) with zip files containing unsupported documents (im assuming). Is there a way to have solr ignore the zip file rather than ignoring the entire file itself? Thanks
... View more
Labels:
12-27-2016
03:48 PM
i am receiving them as mime files. They are archived in hdfs and i have a gethdfs processor retrieving them. I am pushing them into solrcloud using the putsolrcontentstream processor using the /update/extract handler.
... View more
12-27-2016
01:51 PM
I want to extract attachments using the ExtractEmailAttachments nifi processor then merge the content of the email and the attachments into a single flow file. How can i do that?
... View more
Labels:
12-23-2016
08:08 AM
im streaming emails from a folder, how can i convert them into json?
... View more
12-22-2016
10:16 AM
I'm using the PutSolrContentStream processor to push emails into my solrcloud. I understand that the processor is using the tika parser to extract fields from emails. How can i have these fields renamed or updated before pushing them to solr?
... View more
Labels:
12-21-2016
03:20 PM
i did, now im getting this 2016-12-21 15:16:44.124 ERROR (qtp1450821318-15) [ ] o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Error handling 'status' action
at org.apache.solr.handler.admin.CoreAdminOperation$4.call(CoreAdminOperation.java:192)
at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:354)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:153)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:676)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:439)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: file: BlockDirectory(HdfsDirectory@hdfs://db-hdp-nn1.com:8020/user/solr/CFRepo/core_node2/data/index lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@1451c9ff)
... View more
12-21-2016
07:53 AM
I did, i am getting this error now 2016-12-21 07:52:44.394 ERROR (qtp1450821318-19) [c:CentralFiles s:shard2 r:core_node4 x:CentralFiles_shard2_replica2] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: no servers hosting shard: shard1
at org.apache.solr.handler.component.HttpShardHandler.prepDistributed(HttpShardHandler.java:451)
at org.apache.solr.handler.component.SearchHandler.getAndPrepShardHandler(SearchHandler.java:215)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:241)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2082)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:670)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
... View more
12-20-2016
06:10 PM
I am pushing emails on hdfs into solrcloud using nifi (gethdfs and putsolrcontentstream). I am getting the below error in solr solrconfig.xml what does this mean? 2016-12-20 17:55:45.332 ERROR (qtp1450821318-264936) [ ] o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Error handling 'status' action at org.apache.solr.handler.admin.CoreAdminOperation$4.call(CoreAdminOperation.java:192) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:354) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:153)at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155) at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:676) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:439) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.file.NoSuchFileException: /opt/lucidworks-hdpsearch/solr/server/solr/NifiCollection_shard2_replica1/data/index/segments_cm at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) at java.nio.file.Files.readAttributes(Files.java:1737) at java.nio.file.Files.size(Files.java:2332) at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:210) at org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:127) at org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:592) at org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus(CoreAdminOperation.java:886) at org.apache.solr.handler.admin.CoreAdminOperation$4.call(CoreAdminOperation.java:188) ... 27 more
... View more
Labels:
12-20-2016
04:42 PM
I am getting files from HDFS using the GetHDFS processor and pushing them into solrcloud using the PutSolrContentStream processor. I want to push the Path of the file i am retrieving to new field in SolrCloud. If i check the attributes of the files retrieved by the GetHDFS processor, i can't see an attribute containing the full path of the file. If i use the GetFile processor however, there is an attribute named "absolute.path" which contains the path of the file. How can i get the path attribute of the files i am retrieving from HDFS using the GetHDFS processor?
... View more
Labels:
12-20-2016
04:20 PM
Can i update the _text_ field and have it stored and indexed? Also is it possible to update the values and rename the fields using the PutContentStream processor? I am want to be able to store the location of the file im pulling from HDFS in a field in Solr.
... View more
12-20-2016
04:01 PM
Thank you Bryan. I tried searching on the body of the email and i got results. I was under the impression that we can't search on fields that are not stored.
... View more
12-20-2016
12:55 PM
I am using the PutSolrContentStream processor to push emails (.MSG) into my solrcloud. I have put "/update/extract" in the Content Stream Path property in order to extract fields from the msg file using the tika parser. All the fields associated with the emails have been extracted (ex: From, To, CC, Subject etc..) with the exception of the body of the email. How can i have the processor push the body of the email as well? I am able to extract the content of the email and the meta-data programmatically using SolrNet library. How can i do so as well using the PutSolrContentStream processor?
... View more
Labels: