Regarding file handling

I copied the poem by Rabindranath Tagore from internet in the notepad. I used it for file reading. I got an error message and I am not able to solve it. I checked that file is not empty. Please help me.
The code and error message are as follows:


Output of print(f) is:
<_io.TextIOWrapper name='poem.txt' mode='r' encoding='UTF-8'>
Next line of code is:
The error message received is:


UnicodeDecodeError                        Traceback (most recent call last)

<ipython-input-16-ffa068f40971> in <module>()
----> 1

/usr/lib/python3.6/ in decode(self, input, final)
    319         # decode input (taking the buffer into account)
    320         data = self.buffer + input
--> 321         (result, consumed) = self._buffer_decode(data, self.errors, final)
    322         # keep undecoded input until the next call
    323         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x85 in position 102: invalid start byte

Please try the following to open the file:

f=open("poem.txt","r", errors='ignore')

Though it maybe useful for emergency purposes, it’s a good practice to actually find out why this problem is occuring. :slight_smile:

This is Happned because you copyed it frome somewhere.
I also faced it.
If you write down the whole text word by word then you are not going to see this error.


u can use this command
f = open(“poem.txt”, “r”, encoding='utf-8")
if you are using python 3 or higher

This command is not working on Google Colaboratory. If the file is created by copying the poem from internet and we use the command said by you then error is still continued.

Yes when I typed the whole poem word by word, my problem get solved.

But I think typing the whole document again word by word is not a good practice. Does anyone any other valid way to solve my problem?

When I typed the poem word by word problem has solved. But can you tell me any other way to solve this problem?
And when I copied the poem from internet and opened file in the notepad it is all right. Then why is it problematic to access the file in python?

From where did you copy it? @Mugdha
Please share that link.

I just copy pasted the text from the website and saw the raw byte-by-byte data in it:

There seems to be some illegal/corrupt characters present in the copied-text.
Also, some of the full-stops seem to be different than normal full-stops.

So, that’s the reason; happens sometimes when we just copy paste directly from the web. :slight_smile: