Doubt in Pandas DataFrame .get() function

Can someone please explain in an elaborative way what is the role of .get() function.
Also please explain in the context of code pasted below from the FDS hands on session.

short_names  = {}
for s in df.method.unique():
    short_names[s] = ''.join([x[0] for x in s.split(' ')])
print (short_names)

for i, r in  df.iterrows():
    df.loc[i, 'short_method'] = short_names.get(r['method'], r['method'])

Above is the code for reference. Please explain what does r[ ‘method’ ] refer to and what happens when written twice vs when written once?(i know it gives null value if second parameter is missing and the value is null but i am not able to understand in the context of this code like how does it work actually?)

Hi I will try my best to explain this.

The below code must be clear to you:

short_names = {}
for s in df.method.unique():
    short_names[s] = ‘’.join([x[0] for x in s.split(’ ')])

This creates a dictionary of short form names of the method column in the planets dataset. The key of the dictionary is the long form, the value is short form. “Radial velocity” is a key, “RV” is its value. And so on.

The below code :

for i, r in df.iterrows():
    df.loc[i, ‘short_method’] = short_names.get(r[‘method’], r[‘method’])

What this does is

  1. i runs over all of df’s indexes.
  2. r runs over all of df’s rows.
  3. We are setting a new column “short method” in df. And for each row we are setting the dictionary short_names 's value for the key r[‘method’]. r['method] is basically the long name of the method, we use that to search the key in short _names. And .get() fetches the value corresponding to that key and if the key is not available then it returns the second argument which is r[‘method’] in this case. This is the standard way of accessing a dictionary, you could use just short_names(key_name) but the get function ensures there is no error in the code if the key name is not found. So it’s good to learn such standards in python programming.

Hope this answers your question

3 Likes

Thank you so much for such clear explanation. God bless you!

1 Like

[quote=“shresth.mishra.cse22, post:1, topic:7280”]
for i, r in df.iterrows():
[/quote]

Iterating through large pandas dataFrame objects is generally slow. Pandas iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a List Comprehensions , vectorized solution or DataFrame.apply() method.

Pandas DataFrame loop using list comprehension example

result = [(x, y,z) for x, y,z in zip(df[‘column_1’], df[‘column_2’],df[‘column_3’])]