While working with data in Pandas, we perform a vast array of operations on the data to get the data in the desired form, before, for example, creating diagrams or passing to the visualization phase. One of these operations could be that we want to remap the values of a specific column in the DataFrame. This can be done in several ways.
The following example will show that, given a Dataframe containing data about an event, we can remap the values of a specific column to a new value, using a dictionary.
The first step, for this example, is to create a sample dataframe with some dummy data:
import pandas as pd # Creating the DataFrame. df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'], 'Event':['Music', 'Poetry', 'Theatre', 'Comedy'], 'Cost':[10000, 5000, 15000, 2000]}) # Printing the dataframe. print(df)
Now we will remap the values of the Event column by their respective codes:
# Create a dictionary using which we will use to remap the values into the dataframe. dict = {'Music' : 'M', 'Poetry' : 'P', 'Theatre' : 'T', 'Comedy' : 'C'} # Printing the dictionary. print(dict) # Remap the values of the dataframe. df.replace({"Event": dict})
map() Method
We can use map() function to achieve this task:
import pandas as pd # Creating the DataFrame. df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'], 'Event':['Music', 'Poetry', 'Theatre', 'Comedy'], 'Cost':[10000, 5000, 15000, 2000]}) # Printing the dataframe . print(df)
Now we will remap the values of the ‘Event’ column by their respective codes.
# Create a dictionary using which we # will remap the values dict = {'Music' : 'M', 'Poetry' : 'P', 'Theatre' : 'T', 'Comedy' : 'C'} # Print the dictionary print(dict) # Remap the values of the dataframe df['Event']= df['Event'].map(dict) # Print the DataFrame after modification print(df)
Function Approach
Another approach is to using a function in Python, in this case equivalent to the replace() approach, to be able to reuse it anytime we need it during our analysis:
def remap(data,dict_labels): for field,values in dict_labels.items(): print("I am remapping %s"%field) data.replace({field:values},inplace=True) print("DONE") return data
Update Approach
Taking, as an example, di as dictionary and df as a dataframe, if the keys of di are meant to refer to index values, then you could use the update method:
df['col1'].update(pd.Series(di))
For example:
import pandas as pd import numpy as np df = pd.DataFrame({'col1':['w', 10, 20],'col2': ['a', 30, np.nan]},index=[1,2,0]) col1 col2 1 w a 2 10 30 0 20 NaN di = {0: "A", 2: "B"}
The value at the 0-index is mapped to 'A', the value at the 2-index is mapped to 'B':
df['col1'].update(pd.Series(di)) print(df) col1 col2 1 w a 2 B 30 0 A NaN
Note how the keys in di are associated with index values. The order of the index values, that is, the index locations, does not matter.