how to drop duplicate rows in pandas

1 Answer(s)

Gagan Preet

By definition of duplicates, only row index 4 and 5 are duplicates. However snce you need to find duplicates as per only column b and c, you can perform a groupby on b and c and then convert the rows that you get as a single row.

Here are some alternatives based on what you need. Check the difference in the column d to understand what is happening. (Also note that you lose the relavance of the index).

df.groupby(['b', 'c']).max().reset_index()

df.groupby(['b', 'c']).min().reset_index()

group_cols = ['b', 'c']
other_cols = [c for c in df.columns if c not in group_cols]
df.groupby(group_cols).apply(lambda d: d[other_cols].apply(lambda var2: '|'.join([str(v) for v in var2.value_counts().index])))

Does this help you?

Oct 23 2018 12:23 PM

how to drop duplicate rows in pandas

1 Answer(s)

Relevant Projects

You might also like

Related Questions

Related Blogs