Data Frame (planet) Week 10

In week 10 of the data science foundations course, a solution was provided regarding queries in the planetary data set to filter the dataset according to certain conditions. Two methods were taught:

  1. Using the for loop but the result didn’t match my results which I tried to cross-check through describe command for data frames

  2. DataFrame indexing was used where the bitwise ‘&’ operator was used and in this case, results didn’t match again.

The dataset was used from the one available in the seaborn library.

So, the question is whether anyone has faced the same issue so that I could clear my doubts and secondly why the bitwise ‘&’ operator was used and how does it work.

Anyone with any sort of help is welcome.

Thank You

Hi @naveed3923,
Can you share your notebook?

I have provided the details through a thread. Please have a look.

   import seaborn as sns
   import pandas as pd
   import numpy as np

   df_1=sns.load_dataset('planets')

   df_1.dropna(inplace=True)

ROWS WITH YEAR IN 2010s and METHOD IS RADIAL VELOCITY/TRANSIT, DISTANCE>75TH PERCENTILE WAS TO BE FILTERED WHICH WAS DONE THROUGH TWO DIFFERENT APPROACHES WHICH ARE MENTIONED BELOW.

FIRST APPROACH:

   df_2=df_1.copy()
   per_75=df_1.distance.quantile(0.75)
   
   for i,r in df_2.iterrows():
    if r['year']<2010:
           df_2.drop(i,inplace=True)
           continue
    if r['method']!='Radial Velocity' and r['method']!='Transit':
           df_2.drop(i,inplace=True)
           continue
    if r['distance']<per_75:
           df_2.drop(i,inplace=True)
           continue

SECOND APPROACH:

    df_2=df_1.copy()

    df_2=df_2[(df_2['year']>=2010) & ((df_2['method']=='Radial Velocity') | (df_2['method']=='Transit')) & 
             (df_2['distance']>per_75)]

    df_2.describe()

Although I obtained correct results but couldn’t understand how both the approaches worked the same way and also didn’t quite get the working of bitwise (and/or) operators and the reasons for their use.

Hi @naveed3923,
I guess you have a doubt with the precedence of operators.
I’m assuming that you’re not able to understand which operator will be evaluated first.
If this is the case, please refer to the below precedence table.
Note that the precedence of these operators are top to bottom, which means the any operator from first row if used in an expression, will be evaluated first before the one used from any other row.
Another point to note is, if we use two operators from same row, the precedence order will be left to right.
Example: If we use * and / is a single statement, * will be evaluated first.