Here I have couple of interview questions to follow up:
1. The first question is about cluster task monitoring and cluster issue debugging, for which they take the example of elastic search. They run elastic search on multiple clusters lively (with streaming data, say) 7/24. Once for a while, the user may see some issue from the task on one of the clusters and most likely need to restart this cluster. So the questions are: how to identify the problem, can we do it by automation; can we write a batch scripts to restart the task/cluster automatically upon any issue? They do not provide any concrete example and expect me to have relative experiences.
2. They use hadoop and pyspark to aggregate the data and load the time series, then run the python algorithms, then analyze the results in kibana. They already approved my ability in python algorithms(thanks for the Data Science session), but they expect me to understand the data flow and architecture. The hadoop/spark have covered most of the contents. But I am not good enough to have a integrated understanding and picture of architecture. Could you give me some guide and relative resource to study?
My questions are a little abstract, since I just received the question from an informal chat as hints for the next step with other people. Any suggestions on how to approach and what to look into will be highly appreciated. Thanks a lot.