Explain Generator functions in Python?

Generator Python functions, which were introduced with PEP 255, are a special type of Python function that returns a lazy iterator. These are items that, like a list, you can loop over. However, unlike lists, their contents are not stored in memory by lazy iterators. Check out Python “for” Loops for an overview of iterators in Python (Definite Iteration).

You might wonder what they look like in motion now that you have a rough understanding of what a generator does. For two examples, let’s take a look. In the first one, you’ll see how generator Python functions from the viewpoint of a bird’s eye. Then, you can zoom in and more closely analyze each illustration.

Big Files Reading in python

A common use case for generators, such as CSV files, is to work with data streams or large files. By using commas, these text files split data into columns. A common way to exchange data in this format. Now, what if you want to count the number of rows that the CSV file contains? One way of counting those rows is seen in the code block below.

csv_gen = csv_reader(“some_csv.txt”)

row_count = 0

for row in csv_gen:

    row_count += 1

print(f”Row count is {row_count}”)

Looking at this example, you might expect a list to be CSV gen. CSV reader() opens a file to fill this list and loads its contents into CSV gen. The software will then iterate over the list and increase the row count for each row.

If you are interested to Learn Python you can enroll for free live demo Python Online Training

This is a sensible explanation, but if the file is very big, will this design still work? What if the file is bigger than the usable memory that you have? Let’s say that CSV reader() only opens the file and reads it into an array to address this question.

def csv_reader(file_name):

    file = open(file_name)

    result = file.read().split(“\n”)

    return result

This Python function opens a given file and uses the file. read() to connect each line to a list as a separate element along with .split(). In the row counting code block, you saw further up, if you were to use this version of CSV reader() then you would get the following output.

Traceback (most recent call last):

  File “ex1_naive.py”, line 22, in <module>

    main()

  File “ex1_naive.py”, line 13, in main

    csv_gen = csv_reader(“file.txt”)

  File “ex1_naive.py”, line 6, in csv_reader

    result = file.read().split(“\n”)

MemoryError

In this case, open() returns an object from the generator that you can lazily iterate line by line. File.read().split(), though, loads all at once into memory, triggering the MemoryError.

You would probably find the machine slowing down to a crawl before that happens. You might also need to kill the KeyboardInterrupt software. So, how do you maintain these massive files of data? Taking a peek at the latest CSV reader():: description.

def csv_reader(file_name):

    for row in open(file_name, “r”):

        yield row

You open the file in this version, iterate over it, and create a row. This code, with no memory errors, should generate the following output:

Default Asked on April 11, 2021 in Programming.
Add Comment
0 Answer(s)

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.