Understand data first before you do anything with it. How is data structured? What’s each column? What values it has? You collect data. Clean it. Visualize it. Look at top 10 most frequent values for each column. Study outliers. Check distributions and missing values. Group similar values if it’s too fragmented. Look for correlations. Cluster. Classify. Debug.
Join Eric Liu, CEO of WeCareer, in a lively discussion