Streaming twitter Data into HDFS using FLUME



0

Hi Team,

I am facing issue with FLUME in using Twitter as Source and HDFS as Sink.

on executing the Flume-ng Agent, I am getting below on terminal :

x-i386-32 org.apache.flume.node.Application -n TwitterAgent conf -f /usr/local/flume/conf/flume-twitter.conf
16/12/09 09:37:13 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
16/12/09 09:37:13 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/usr/local/flume/conf/flume-twitter.conf
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: TwitterAgent
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Processing:HDFS
16/12/09 09:37:13 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [TwitterAgent]
16/12/09 09:37:13 INFO node.AbstractConfigurationProvider: Creating channels
16/12/09 09:37:13 INFO channel.DefaultChannelFactory: Creating instance of channel MemChannel type memory
16/12/09 09:37:13 INFO node.AbstractConfigurationProvider: Created channel MemChannel
16/12/09 09:37:13 INFO source.DefaultSourceFactory: Creating instance of source Twitter, type com.cloudera.flume.source.TwitterSource
16/12/09 09:37:13 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
16/12/09 09:37:14 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
16/12/09 09:37:14 INFO node.AbstractConfigurationProvider: Channel MemChannel connected to [Twitter, HDFS]
16/12/09 09:37:14 INFO node.Application: Starting new configuration:{ sourceRunners:{Twitter=EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@1846149 counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }
16/12/09 09:37:14 INFO node.Application: Starting Channel MemChannel
16/12/09 09:37:14 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: MemChannel, registered successfully.
16/12/09 09:37:14 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
16/12/09 09:37:14 INFO node.Application: Starting Sink HDFS
16/12/09 09:37:14 INFO node.Application: Starting Source Twitter
16/12/09 09:37:14 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: HDFS, registered successfully.
16/12/09 09:37:14 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
16/12/09 09:37:14 INFO twitter4j.TwitterStreamImpl: Establishing connection.
16/12/09 09:37:48 INFO twitter4j.TwitterStreamImpl: Connection established.
16/12/09 09:37:48 INFO twitter4j.TwitterStreamImpl: Receiving status stream.
After this , it will hang up and proceeding further. Kindly help me to resolve it. It was working previously.

TwitterAgent.sources=Twitter
TwitterAgent.channels=MemChannel
TwitterAgent.sinks=HDFS

#Describe the source
TwitterAgent.sources.Twitter.type=com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels=MemChannel

TwitterAgent.sources.Twitter.consumerKey=**************************
TwitterAgent.sources.Twitter.consumerSecret=***************************************
TwitterAgent.sources.Twitter.accessToken=****************************************
TwitterAgent.sources.Twitter.accessTokenSecret=***********************************

#Twitter handles to search
TwitterAgent.sources.Twitter.keywords = G854gaurcity, awanishtiwari

TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://192.168.230.132:10001/Flume/Twitter
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=1000
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600

TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=10000
TwitterAgent.channels.MemChannel.transactionCapacity=100

Let me know if you need more information. Quick reply will be really helpful


1 Answer(s)


0

Hi Gaurav,

There is no issue with code and execution, it is taking time to fetch the keywords from twitter.

Please add more keywords.

Hope this helps,

Thanks.

Your Answer

Click on this code-snippet-icon icon to add code snippet.

Upload Files (Maximum image file size - 1.5 MB, other file size - 10 MB, total size - not more than 50 MB)

Email
Password