Flume- Getting an error when the file



0
Installed Flume. Added the flume-conf.properties.
Started the flume agent.
When I put the employees.csv file in /home/cloudera/flumeSpool directory, I can see the flume agent is listening to the file and it's trying to copy the file into HDFS to the location /user/cloudera/flume - I am getting the following error.

14/11/10 13:57:00 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
14/11/10 13:57:01 INFO hdfs.BucketWriter: Creating /user/cloudera/flume//FlumeData.1415656620523.tmp
14/11/10 13:57:09 ERROR hdfs.HDFSEventSink: process failed
java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache;
at org.apache.hadoop.hdfs.DomainSocketFactory.(DomainSocketFactory.java:46)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:456)

Detailed Error message:

14/11/10 13:56:22 INFO source.SpoolDirectorySource: SpoolDirectorySource source starting with directory: /home/cloudera/flumeSpool
14/11/10 13:56:22 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: hdfs-sink, registered successfully.
14/11/10 13:56:22 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: hdfs-sink started
14/11/10 13:56:22 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SOURCE, name: src-1, registered successfully.
14/11/10 13:56:22 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: src-1 started
14/11/10 13:56:57 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/cloudera/flumeSpool/employees.csv to /home/cloudera/flumeSpool/employees.csv.COMPLETED
14/11/10 13:57:00 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
14/11/10 13:57:01 INFO hdfs.BucketWriter: Creating /user/cloudera/flume//FlumeData.1415656620523.tmp
14/11/10 13:57:09 ERROR hdfs.HDFSEventSink: process failed
java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache;
at org.apache.hadoop.hdfs.DomainSocketFactory.(DomainSocketFactory.java:46)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:456)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:410)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2308)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2342)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2324)

8 Answer(s)


0

I see an issue here "/user/cloudera/flume//FlumeData.1415656620523.tmp". the temp file that it is creating has two slashes //. Check the path that is set for agent.sinks.hdfs-sink.hdfs.path. I guess you have it as "/user/cloudera/flume/" and if it is so then try modifying to "/user/cloudera/flume". So try changing the setting as below agent.sinks.hdfs-sink.hdfs.path = module9/cloudera/flume.

0

Thanks Sravan for checking!
I changed the agent.sinks.hdfs-sink.hdfs.path=/user/cloudera/flume

14/11/10 16:35:12 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: src-1 started
14/11/10 16:35:27 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/cloudera/flumeSpool/employees.csv to /home/cloudera/flumeSpool/employees.csv.COMPLETED
14/11/10 16:35:27 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
14/11/10 16:35:28 INFO hdfs.BucketWriter: Creating /user/cloudera/flume/FlumeData.1415666127523.tmp
14/11/10 16:35:35 ERROR hdfs.HDFSEventSink: process failed
java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache;
at org.apache.hadoop.hdfs.DomainSocketFactory.(DomainSocketFactory.java:46)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:456)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:410)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2308)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87)

0

Please note I am using CDH 4.4

0

hi Thomas,

Request you to post the flume config and the steps that you have used to create the directories and the steps to execute flume.

Thanks

0

Steps:
mkdir /home/cloudera/flumeSpool

sudo su
cp /home/cloudera/class2/flume-conf.properties /usr/lib/flume-ng/apache-flume-1.4.0-bin/conf/

hadoop fs -mkdir /user/cloudera/flume/

cd /usr/lib/flume-ng/apache-flume-1.4.0-bin/bin/

./flume-ng agent -n agent -c conf -f /usr/lib/flume-ng/apache-flume-1.4.0-bin/conf/flume-conf.properties

cp /home/cloudera/class2/input/employees.csv /home/cloudera/flumeSpool/

flume-conf.properties:
/usr/lib/flume-ng/apache-flume-1.4.0-bin/conf/flume-conf.properties

# example.conf: A single-node Flume configuration



# Name the components on this agent

agent.sources = src-1

agent.sinks = hdfs-sink

agent.channels = memory-channel



#Source properties, its a spolling source which will take data from directory /var/log/apache/flumeSpool

agent.sources.src-1.type = spooldir

agent.sources.src-1.spoolDir = /home/cloudera/flumeSpool

agent.sources.src-1.fileHeader = true



# Use a channel which buffers events in memory

agent.channels.memory-channel.type = memory

agent.channels.memory-channel.capacity = 1000

agent.channels.memory-channel.transactionCapacity = 100



#Sink properties, hdfs source which will store data here

agent.sinks.hdfs-sink.type = hdfs

agent.sinks.hdfs-sink.hdfs.path = /user/cloudera/flume

agent.sinks.hdfs-sink.hdfs.fileType = DataStream
agent.sinks.hdfs-sink.hdfs.rollCount = 20

This is the message I am getting - java.lang.NoSuchMethodError:

14/11/11 22:07:11 INFO hdfs.BucketWriter: Creating /user/cloudera/flume/FlumeData.1415772430830.tmp
14/11/11 22:07:19 ERROR hdfs.HDFSEventSink: process failed
java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache;
at org.apache.hadoop.hdfs.DomainSocketFactory.(DomainSocketFactory.java:46)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:456)






# Bind the source and sink to the channel

agent.sources.src-1.channels = memory-channel

agent.sinks.hdfs-sink.channel = memory-channel




0

Hi Thomas,
What is the reason for installing flume seperately.
Cloudeea VM already has flume installed.
The flume example that was provided is working fine . It was
tested with flume that came with clousera VM.

We suggest that you try out with default flume conf that was provided by
Dezyre.

Thanks

0

That's it!

I tried with the flume which comes with Cloudera-VM and it's working fine.

Thanks for your help!

0

Thomas,

Before trying with any new versions it's always safer to work with versions that ship with the VM. Once you gain proficiency, you can try with new versions but you also need to know what new changes are coming in various releases.

Thanks

Your Answer

Click on this code-snippet-icon icon to add code snippet.

Upload Files (Maximum image file size - 1.5 MB, other file size - 10 MB, total size - not more than 50 MB)

Email
Password