Large data sets, but in what form? Numbers, text and graphics? How about different languages?

Well, lots of data means lots of things.
Are we talking about all numeric data, all text data, or all graphics-in bytes etc..
Anyway, my primary interest is how does Hadoop help with large data in different languages? French, Spanish, Arabic, Chinese?

1 Answer(s)


hi Naveen,

When we say lots of data, this is the data that can be text, numeric, graphic, audio,video and other binary forms and the data is in GB or in TB's.

Hadoop is not aware of what data is being stored, you can store UTF8 charsets (meaning Chinese, etc) and process them using Java/Python using specific jars/modules that are available for processing international charset.

Hope this helps.