Streaming twitter Data into HDFS using FLUME



0

Hi Team,

I am facing issue with FLUME in using Twitter as Source and HDFS as Sink.

on executing the Flume-ng Agent, I am getting below on terminal :

x-i386-32 org.apache.flume.node.Application -n TwitterAgent conf -f /usr/local/flume/conf/flume-twitter.conf
16/12/09 09:37:13 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
16/12/09 09:37:13 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/usr/local/flume/conf/flume-twitter.conf
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: TwitterAgent
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [TwitterAgent]
16/12/09 09:37:13 INFO node.AbstractConfigurationProvider: Creating channels
16/12/09 09:37:13 INFO channel.DefaultChannelFactory: Creating instance of channel MemChannel type memory
16/12/09 09:37:13 INFO node.AbstractConfigurationProvider: Created channel MemChannel
16/12/09 09:37:13 INFO source.DefaultSourceFactory: Creating instance of source Twitter, type com.cloudera.flume.source.TwitterSource
16/12/09 09:37:13 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
16/12/09 09:37:14 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
16/12/09 09:37:14 INFO node.AbstractConfigurationProvider: Channel MemChannel connected to [Twitter, HDFS]
16/12/09 09:37:14 INFO node.Application: Starting new configuration:{ sourceRunners:{Twitter=EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@1846149 counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }
16/12/09 09:37:14 INFO node.Application: Starting Channel MemChannel
16/12/09 09:37:14 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: MemChannel, registered successfully.
16/12/09 09:37:14 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
16/12/09 09:37:14 INFO node.Application: Starting Sink HDFS
16/12/09 09:37:14 INFO node.Application: Starting Source Twitter
16/12/09 09:37:14 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: HDFS, registered successfully.
16/12/09 09:37:14 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
16/12/09 09:37:14 INFO twitter4j.TwitterStreamImpl: Establishing connection.
16/12/09 09:37:48 INFO twitter4j.TwitterStreamImpl: Connection established.
16/12/09 09:37:48 INFO twitter4j.TwitterStreamImpl: Receiving status stream.
After this , it will hang up and proceeding further. Kindly help me to resolve it. It was working previously.

TwitterAgent.sources=Twitter
TwitterAgent.channels=MemChannel
TwitterAgent.sinks=HDFS

#Describe the source
TwitterAgent.sources.Twitter.type=com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels=MemChannel

TwitterAgent.sources.Twitter.consumerKey=**************************
TwitterAgent.sources.Twitter.consumerSecret=***************************************
TwitterAgent.sources.Twitter.accessToken=****************************************
TwitterAgent.sources.Twitter.accessTokenSecret=***********************************

#Twitter handles to search
TwitterAgent.sources.Twitter.keywords = G854gaurcity, awanishtiwari

TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://192.168.230.132:10001/Flume/Twitter
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=1000
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600

TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=10000
TwitterAgent.channels.MemChannel.transactionCapacity=100

Let me know if you need more information. Quick reply will be really helpful


1 Answer(s)


0

Hi Gaurav,

There is no issue with code and execution, it is taking time to fetch the keywords from twitter.

Please add more keywords.

Hope this helps,

Thanks.