Use cluster analysis to identify the groups of characteristically similar schools in the College Scorecard dataset. Considerations: Clustering Algorithm Data Preparation How will you deal with missing values? Categorical variables? Feature intercorrelations? Feature normalization or scaling? Dimensionality reduction? Hyperparameters How will you set the parameters -- the algorithm's knobs and dials, so to speak -- in order to achieve valid and useful output? Interpretation Is it possible to explain what each cluster represents? Did you retain or prepare a set of features that enables a meaningful interpretation of the clusters? Do the compositions of the clusters seem to make sense? Validation How will you measure the validity of your clustering process? Which metrics will you use and how will you apply them?
We all at some point in time wished to create our own language as a child! But what if certain words always cooccur with another in a corpus? Thus you can make your own model which will understand which word goes with which one, which words are often coming together etc. This all can be done by building a custom embeddings model which we create in this project
CRNNs combine both convolutional and recurrent architectures and is widely used in text detection and optical character recognition (OCR). In this project, we are going to use a CRNN architecture to detect text in sample images. The data we are going to use is TRSynth100k from Kaggle. Given an image containing some text, the goal here is to correctly identify the text using the CRNN architecture. We are going to train the model end-to-end from scratch.
In this spark project, we will continue building the data warehouse from the previous project Yelp Data Processing Using Spark And Hive Part 1 and will do further data processing to develop diverse data products.