Twitter flume practice



0

Hi,   after I started the agent with twitter source, cloudera twitter jar and flume config with twitter API keys.

flume-ng agent -n TwitterAgent -c conf -f flume_tweet.conf 

I could get files on hdfs with json contets.

[cloudera@localhost ~]$ hadoop fs -ls tweets/year=2017/month=05/dt=14/dthour=10
Found 1 items
-rw-r--r--   3 cloudera cloudera      85809 2017-05-14 10:59 tweets/year=2017/month=05/dt=14/dthour=10/FlumeData.1494784752589

But I got error on agent console too. How can I fix this ?

17/05/14 10:59:27 WARN hdfs.BucketWriter: Block Under-replication detected. Rotating file.
17/05/14 10:59:27 INFO hdfs.BucketWriter: Renaming /user/cloudera/tweets/year=2017/month=05/dt=14/dthour=11/FlumeData.1494784753750.tmp to /user/cloudera/tweets/year=2017/month=05/dt=14/dthour=11/FlumeData.1494784753750
17/05/14 10:59:28 INFO hdfs.BucketWriter: Creating /user/cloudera/tweets/year=2017/month=05/dt=14/dthour=11//FlumeData.1494784753751.tmp
17/05/14 10:59:28 ERROR hdfs.HDFSEventSink: process failed
org.apache.flume.ChannelException: Take list for MemoryTransaction, capacity 100 full, consider committing more frequently, increasing capacity, or increasing thread count
	at org.apache.flume.channel.MemoryChannel$MemoryTransaction.doTake(MemoryChannel.java:102)
	at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
	at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:350)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:662)
17/05/14 10:59:28 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: org.apache.flume.ChannelException: Take list for MemoryTransaction, capacity 100 full, consider committing more frequently, increasing capacity, or increasing thread count
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.flume.ChannelException: Take list for MemoryTransaction, capacity 100 full, consider committing more frequently, increasing capacity, or increasing thread count
	at org.apache.flume.channel.MemoryChannel$MemoryTransaction.doTake(MemoryChannel.java:102)
	at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
	at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:350)
	... 3 more
17/05/14 10:59:33 WARN hdfs.BucketWriter: Block Under-replication detected. Rotating file.

 


2 Answer(s)


0

I got it solved by changing the config transaction Capacity from 100 to 1000.

it has more velocity.

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 1000

 

 

 

 


0

Hi Masashi,

Your solution is correct, you just need to the transaction capacity.

Thanks for the update.

Your Answer

Click on this code-snippet-icon icon to add code snippet.

Upload Files (Maximum image file size - 1.5 MB, other file size - 10 MB, total size - not more than 50 MB)

Email
Password