How to Read Line in a Console Python

How to extract specific portions of a text file using Python

Updated: 06/xxx/2020 by Computer Hope

Python programming language logo

Extracting text from a file is a mutual task in scripting and programming, and Python makes it easy. In this guide, we'll talk over some uncomplicated means to extract text from a file using the Python three programming language.

Make sure you lot're using Python 3

In this guide, we'll be using Python version 3. Most systems come pre-installed with Python 2.7. While Python 2.7 is used in legacy code, Python three is the present and future of the Python language. Unless you have a specific reason to write or support Python 2, we recommend working in Python 3.

For Microsoft Windows, Python 3 can be downloaded from the Python official website. When installing, make certain the "Install launcher for all users" and "Add Python to PATH" options are both checked, every bit shown in the image below.

Installing Python 3.7.2 for Windows.

On Linux, you can install Python 3 with your packet manager. For instance, on Debian or Ubuntu, you tin install it with the following command:

sudo apt-get update && sudo apt-get install python3

For macOS, the Python iii installer can be downloaded from python.org, as linked above. If you lot are using the Homebrew package manager, it can also be installed by opening a last window (ApplicationsUtilities), and running this command:

brew install python3

Running Python

On Linux and macOS, the command to run the Python iii interpreter is python3. On Windows, if y'all installed the launcher, the command is py. The commands on this folio utilize python3; if you're on Windows, substitute py for python3 in all commands.

Running Python with no options starts the interactive interpreter. For more information about using the interpreter, run across Python overview: using the Python interpreter. If you accidentally enter the interpreter, you can leave information technology using the command exit() or quit().

Running Python with a file proper noun will interpret that python plan. For example:

python3 program.py

...runs the program contained in the file plan.py.

Okay, how can we utilize Python to extract text from a text file?

Reading information from a text file

First, let's read a text file. Let'south say we're working with a file named lorem.txt, which contains lines from the Lorem Ipsum case text.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc fringilla arcu congue metus aliquam mollis. Mauris nec maximus purus. Maecenas sit amet pretium tellus. Quisque at dignissim lacus.

Note

In all the examples that follow, we work with the 4 lines of text independent in this file. Copy and paste the latin text above into a text file, and salve it every bit lorem.txt, so you lot tin run the example lawmaking using this file as input.

A Python program can read a text file using the built-in open() function. For instance, the Python 3 program below opens lorem.txt for reading in text style, reads the contents into a string variable named contents, closes the file, and prints the information.

myfile = open("lorem.txt", "rt") # open lorem.txt for reading text contents = myfile.read()         # read the entire file to string myfile.close()                   # close the file print(contents)                  # print string contents

Here, myfile is the name we give to our file object.

The "rt" parameter in the open() function means "we're opening this file to read text data"

The hash marking ("#") means that everything on that line is a annotate, and it's ignored by the Python interpreter.

If you salvage this plan in a file called read.py, you can run information technology with the following control.

python3 read.py

The command above outputs the contents of lorem.txt:

Lorem ipsum dolor sit down amet, consectetur adipiscing elit. Nunc fringilla arcu congue metus aliquam mollis. Mauris nec maximus purus. Maecenas sit amet pretium tellus. Quisque at dignissim lacus.

Using "with open"

Information technology's important to close your open files as soon every bit possible: open up the file, perform your functioning, and close it. Don't leave it open up for extended periods of fourth dimension.

When you're working with files, information technology'southward good practice to utilise the with open up...as compound statement. It's the cleanest manner to open a file, operate on it, and shut the file, all in one easy-to-read cake of code. The file is automatically closed when the code block completes.

Using with open up...as, nosotros can rewrite our program to expect like this:

with open ('lorem.txt', 'rt') as myfile:  # Open lorem.txt for reading text     contents = myfile.read()              # Read the entire file to a cord print(contents)                           # Print the string

Note

Indentation is important in Python. Python programs use white space at the starting time of a line to define scope, such as a block of code. Nosotros recommend you apply four spaces per level of indentation, and that you use spaces rather than tabs. In the following examples, brand certain your code is indented exactly as it'due south presented here.

Instance

Save the plan as read.py and execute it:

python3 read.py

Output:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc fringilla arcu congue metus aliquam mollis. Mauris nec maximus purus. Maecenas sit down amet pretium tellus. Quisque at dignissim lacus.

Reading text files line-by-line

In the examples so far, we've been reading in the whole file at once. Reading a total file is no big deal with small-scale files, but generally speaking, information technology's not a dandy idea. For 1 matter, if your file is bigger than the amount of available memory, y'all'll see an mistake.

In almost every example, it's a better idea to read a text file one line at a time.

In Python, the file object is an iterator. An iterator is a type of Python object which behaves in sure ways when operated on repeatedly. For example, yous tin use a for loop to operate on a file object repeatedly, and each time the same performance is performed, you'll receive a different, or "next," event.

Example

For text files, the file object iterates one line of text at a fourth dimension. It considers one line of text a "unit of measurement" of data, and so we tin can use a for...in loop argument to iterate one line at a fourth dimension:

with open ('lorem.txt', 'rt') equally myfile:  # Open up lorem.txt for reading     for myline in myfile:              # For each line, read to a string,         impress(myline)                  # and print the cord.

Output:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.  Nunc fringilla arcu congue metus aliquam mollis.  Mauris nec maximus purus. Maecenas sit amet pretium tellus.  Quisque at dignissim lacus.

Notice that we're getting an extra line break ("newline") after every line. That's because 2 newlines are being printed. The get-go one is the newline at the cease of every line of our text file. The second newline happens because, by default, print() adds a linebreak of its own at the end of whatever you've asked it to print.

Allow'south store our lines of text in a variable — specifically, a listing variable — so we tin look at it more closely.

Storing text information in a variable

In Python, lists are similar to, but not the same every bit, an array in C or Java. A Python list contains indexed data, of varying lengths and types.

Example

mylines = []                             # Declare an empty list named mylines. with open ('lorem.txt', 'rt') as myfile: # Open lorem.txt for reading text information.     for myline in myfile:                # For each line, stored as myline,         mylines.append(myline)           # add together its contents to mylines. print(mylines)                           # Print the listing.

The output of this program is a little different. Instead of printing the contents of the listing, this programme prints our listing object, which looks like this:

Output:

['Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n', 'Nunc fringilla arcu congue metus aliquam mollis.\northward', 'Mauris nec maximus purus. Maecenas sit down amet pretium tellus.\n', 'Quisque at dignissim lacus.\n']

Here, we meet the raw contents of the list. In its raw object form, a list is represented as a comma-delimited list. Here, each element is represented as a cord, and each newline is represented as its escape graphic symbol sequence, \n.

Much like a C or Coffee array, the listing elements are accessed by specifying an index number after the variable proper name, in brackets. Index numbers start at zero — other words, the due northth element of a listing has the numeric index n-1.

Note

If you're wondering why the index numbers first at nothing instead of ane, you're non alone. Computer scientists have debated the usefulness of zero-based numbering systems in the past. In 1982, Edsger Dijkstra gave his opinion on the subject, explaining why zero-based numbering is the best mode to index data in computer science. You can read the memo yourself — he makes a compelling argument.

Case

We can print the beginning chemical element of lines by specifying alphabetize number 0, contained in brackets later on the name of the listing:

impress(mylines[0])

Output:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc fringilla arcu congue metus aliquam mollis.

Example

Or the third line, by specifying alphabetize number two:

print(mylines[two])

Output:

Quisque at dignissim lacus.

But if we try to access an index for which in that location is no value, we get an error:

Example

impress(mylines[3])

Output:

Traceback (most recent call last): File <filename>, line <linenum>, in <module> impress(mylines[3]) IndexError: list index out of range

Example

A listing object is an iterator, then to impress every element of the list, we tin can iterate over it with for...in:

mylines = []                              # Declare an empty listing with open ('lorem.txt', 'rt') as myfile:  # Open lorem.txt for reading text.     for line in myfile:                   # For each line of text,         mylines.append(line)              # add that line to the list.     for element in mylines:               # For each element in the list,         print(element)                    # impress it.

Output:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.  Nunc fringilla arcu congue metus aliquam mollis.  Mauris nec maximus purus. Maecenas sit down amet pretium tellus.  Quisque at dignissim lacus.

But nosotros're still getting extra newlines. Each line of our text file ends in a newline character ('\n'), which is being printed. Besides, afterward printing each line, impress() adds a newline of its own, unless y'all tell it to practise otherwise.

We can change this default behavior by specifying an finish parameter in our impress() call:

impress(element, end='')

By setting finish to an empty cord (two single quotes, with no space), we tell print() to impress nothing at the end of a line, instead of a newline graphic symbol.

Example

Our revised program looks like this:

mylines = []                              # Declare an empty list with open ('lorem.txt', 'rt') every bit myfile:  # Open file lorem.txt     for line in myfile:                   # For each line of text,         mylines.append(line)              # add that line to the list.     for element in mylines:               # For each chemical element in the list,         impress(element, end='')            # print it without extra newlines.

Output:

Lorem ipsum dolor sit down amet, consectetur adipiscing elit. Nunc fringilla arcu congue metus aliquam mollis. Mauris nec maximus purus. Maecenas sit amet pretium tellus. Quisque at dignissim lacus.

The newlines you meet hither are actually in the file; they're a special character ('\north') at the end of each line. Nosotros want to get rid of these, so nosotros don't have to worry about them while we procedure the file.

How to strip newlines

To remove the newlines completely, we tin strip them. To strip a string is to remove one or more than characters, usually whitespace, from either the first or stop of the string.

Tip

This process is sometimes also called "trimming."

Python three string objects accept a method called rstrip(), which strips characters from the correct side of a string. The English language reads left-to-right, so stripping from the right side removes characters from the end.

If the variable is named mystring, nosotros can strip its right side with mystring.rstrip(chars), where chars is a cord of characters to strip. For instance, "123abc".rstrip("bc") returns 123a.

Tip

When you represent a string in your programme with its literal contents, information technology's chosen a string literal. In Python (equally in about programming languages), string literals are e'er quoted — enclosed on either side by single (') or double (") quotes. In Python, single and double quotes are equivalent; you can apply one or the other, equally long every bit they match on both ends of the cord. It'southward traditional to represent a human-readable string (such as Hullo) in double-quotes ("Hello"). If you're representing a single character (such as b), or a single special grapheme such as the newline character (\n), information technology's traditional to use single quotes ('b', '\n'). For more than information about how to use strings in Python, you tin read the documentation of strings in Python.

The argument string.rstrip('\n') will strip a newline character from the correct side of string. The following version of our program strips the newlines when each line is read from the text file:

mylines = []                                # Declare an empty listing. with open ('lorem.txt', 'rt') as myfile:    # Open lorem.txt for reading text.     for myline in myfile:                   # For each line in the file,         mylines.append(myline.rstrip('\n')) # strip newline and add together to list. for element in mylines:                     # For each element in the listing,     print(element)                          # print it.

The text is at present stored in a list variable, so private lines tin be accessed by index number. Newlines were stripped, and then we don't have to worry about them. We can e'er put them back later if we reconstruct the file and write it to deejay.

Now, let's search the lines in the list for a specific substring.

Searching text for a substring

Permit'due south say we desire to locate every occurrence of a certain phrase, or fifty-fifty a unmarried letter. For instance, maybe we demand to know where every "e" is. Nosotros can reach this using the cord'southward find() method.

The list stores each line of our text as a string object. All cord objects have a method, find(), which locates the first occurrence of a substrings in the cord.

Let'southward apply the find() method to search for the letter "e" in the first line of our text file, which is stored in the list mylines. The beginning element of mylines is a string object containing the first line of the text file. This cord object has a observe() method.

In the parentheses of observe(), we specify parameters. The first and only required parameter is the string to search for, "e". The argument mylines[0].discover("e") tells the interpreter to search forward, starting at the beginning of the string, one graphic symbol at a fourth dimension, until it finds the letter "e." When it finds i, it stops searching, and returns the alphabetize number where that "east" is located. If it reaches the end of the cord, it returns -1 to indicate nothing was establish.

Example

print(mylines[0].find("e"))

Output:

3

The render value "3" tells us that the letter "e" is the fourth grapheme, the "e" in "Lorem". (Remember, the index is zip-based: index 0 is the first graphic symbol, ane is the second, etc.)

The notice() method takes two optional, additional parameters: a get-go index and a end index, indicating where in the string the search should brainstorm and cease. For instance, string.find("abc", 10, 20) searches for the substring "abc", but only from the 11th to the 21st character. If stop is not specified, notice() starts at index start, and stops at the finish of the string.

Example

For instance, the following statement searchs for "e" in mylines[0], starting time at the fifth character.

print(mylines[0].notice("e", four))

Output:

24

In other words, starting at the 5th grapheme in line[0], the first "e" is located at index 24 (the "e" in "nec").

Instance

To offset searching at index 10, and stop at index 30:

impress(mylines[i].find("e", 10, 30))

Output:

28

(The first "e" in "Maecenas").

If find() doesn't locate the substring in the search range, it returns the number -ane, indicating failure:

impress(mylines[0].notice("east", 25, xxx))

Output:

-1

There were no "e" occurrences between indices 25 and 30.

Finding all occurrences of a substring

Just what if we want to locate every occurrence of a substring, not merely the first one nosotros encounter? We can iterate over the string, starting from the index of the previous match.

In this instance, nosotros'll utilize a while loop to repeatedly find the letter of the alphabet "e". When an occurrence is found, we call find again, starting from a new location in the string. Specifically, the location of the last occurrence, plus the length of the string (so nosotros can move forrad past the last one). When find returns -1, or the start index exceeds the length of the string, nosotros stop.

# Build assortment of lines from file, strip newlines  mylines = []                                # Declare an empty list. with open ('lorem.txt', 'rt') equally myfile:    # Open lorem.txt for reading text.     for myline in myfile:                   # For each line in the file,         mylines.append(myline.rstrip('\north')) # strip newline and add to list.  # Locate and print all occurences of alphabetic character "e"  substr = "e"                  # substring to search for. for line in mylines:          # string to be searched   index = 0                   # current alphabetize: character being compared   prev = 0                    # previous index: last graphic symbol compared   while index < len(line):    # While alphabetize has non exceeded string length,     index = line.detect(substr, index)  # gear up index to kickoff occurrence of "e"     if alphabetize == -1:           # If goose egg was found,       break                   # leave the while loop.     impress(" " * (index - prev) + "e", cease='')  # print spaces from previous                                                # friction match, then the substring.     prev = index + len(substr)       # remember this position for next loop.     index += len(substr)      # increment the index by the length of substr.                               # (Repeat until index > line length)   print('\n' + line);         # Print the original string under the east'south        

Output:

          e                    e       east  eastward               e Lorem ipsum dolor sit amet, consectetur adipiscing elit.                          eastward  e Nunc fringilla arcu congue metus aliquam mollis.         e                   due east e          eastward    e      east Mauris nec maximus purus. Maecenas sit down amet pretium tellus.       east Quisque at dignissim lacus.

Incorporating regular expressions

For complex searches, use regular expressions.

The Python regular expressions module is called re. To use it in your program, import the module before you use it:

import re

The re module implements regular expressions past compiling a search pattern into a pattern object. Methods of this object tin can then be used to perform match operations.

For example, allow'due south say you want to search for any word in your document which starts with the letter of the alphabet d and ends in the letter r. We tin attain this using the regular expression "\bd\westward*r\b". What does this mean?

character sequence meaning
\b A word purlieus matches an empty string (anything, including nothing at all), but simply if information technology appears before or afterwards a non-word character. "Word characters" are the digits 0 through ix, the lowercase and capital letter messages, or an underscore ("_").
d Lowercase letter d.
\w* \w represents any give-and-take character, and * is a quantifier meaning "goose egg or more than of the previous character." So \w* will match zero or more word characters.
r Lowercase letter of the alphabet r.
\b Word boundary.

So this regular expression will match any cord that tin be described as "a discussion boundary, and then a lowercase 'd', then zero or more word characters, and so a lowercase 'r', and so a give-and-take boundary." Strings described this mode include the words destroyer, bleak, and doctor, and the abbreviation dr.

To use this regular expression in Python search operations, nosotros first compile it into a blueprint object. For instance, the following Python statement creates a pattern object named pattern which we can use to perform searches using that regular expression.

pattern = re.compile(r"\bd\w*r\b")

Annotation

The letter r before our string in the argument above is important. It tells Python to translate our cord as a raw string, exactly as we've typed information technology. If nosotros didn't prefix the string with an r, Python would interpret the escape sequences such as \b in other means. Whenever yous need Python to interpret your strings literally, specify it as a raw string by prefixing information technology with r.

At present we can use the blueprint object's methods, such as search(), to search a cord for the compiled regular expression, looking for a friction match. If information technology finds one, it returns a special consequence called a friction match object. Otherwise, it returns None, a built-in Python constant that is used similar the boolean value "simulated".

import re str = "Good morning, doctor." pat = re.compile(r"\bd\due west*r\b")  # compile regex "\bd\due west*r\b" to a pattern object if pat.search(str) != None:     # Search for the blueprint. If establish,     print("Found it.")

Output:

Plant it.

To perform a case-insensitive search, y'all tin specify the special constant re.IGNORECASE in the compile step:

import re str = "Hello, Doc." pat = re.compile(r"\bd\w*r\b", re.IGNORECASE)  # upper and lowercase will match if pat.search(str) != None:     print("Found it.")

Output:

Found it.

Putting it all together

So now we know how to open a file, read the lines into a list, and locate a substring in whatever given list element. Let's utilise this knowledge to build some example programs.

Print all lines containing substring

The program below reads a log file line by line. If the line contains the word "error," it is added to a listing called errors. If not, it is ignored. The lower() cord method converts all strings to lowercase for comparing purposes, making the search instance-insensitive without altering the original strings.

Annotation that the find() method is called directly on the result of the lower() method; this is called method chaining. Also, note that in the impress() statement, we construct an output cord past joining several strings with the + operator.

errors = []                       # The list where we volition shop results. linenum = 0 substr = "mistake".lower()          # Substring to search for. with open ('logfile.txt', 'rt') as myfile:     for line in myfile:         linenum += 1         if line.lower().find(substr) != -1:    # if case-insensitive match,             errors.append("Line " + str(linenum) + ": " + line.rstrip('\n')) for err in errors:     impress(err)

Input (stored in logfile.txt):

This is line one This is line 2 Line three has an error! This is line 4 Line v as well has an error!

Output:

Line 3: Line 3 has an mistake! Line 5: Line v also has an fault!

Extract all lines containing substring, using regex

The programme below is similar to the higher up program, but using the re regular expressions module. The errors and line numbers are stored as tuples, e.m., (linenum, line). The tuple is created past the additional enclosing parentheses in the errors.suspend() argument. The elements of the tuple are referenced similar to a listing, with a zero-based index in brackets. Equally synthetic hither, err[0] is a linenum and err[ane] is the associated line containing an error.

import re errors = [] linenum = 0 pattern = re.compile("mistake", re.IGNORECASE)  # Compile a example-insensitive regex with open up ('logfile.txt', 'rt') as myfile:         for line in myfile:         linenum += 1         if pattern.search(line) != None:      # If a friction match is found              errors.suspend((linenum, line.rstrip('\n'))) for err in errors:                            # Iterate over the list of tuples     print("Line " + str(err[0]) + ": " + err[1])

Output:

Line six: Mar 28 09:ten:37 Mistake: cannot contact server. Connection refused. Line 10: Mar 28 10:28:15 Kernel error: The specified location is not mounted. Line 14: Mar 28 11:06:30 ERROR: usb one-i: tin't set config, exiting.

Extract all lines containing a phone number

The plan below prints any line of a text file, info.txt, which contains a Us or international phone number. Information technology accomplishes this with the regular expression "(\+\d{ane,2})?[\s.-]?\d{three}[\southward.-]?\d{four}". This regex matches the following telephone number notations:

  • 123-456-7890
  • (123) 456-7890
  • 123 456 7890
  • 123.456.7890
  • +91 (123) 456-7890
import re errors = [] linenum = 0 pattern = re.compile(r"(\+\d{one,ii})?[\s.-]?\d{3}[\s.-]?\d{four}") with open up ('info.txt', 'rt') as myfile:     for line in myfile:         linenum += i         if pattern.search(line) != None:  # If design search finds a lucifer,             errors.append((linenum, line.rstrip('\due north'))) for err in errors:     print("Line ", str(err[0]), ": " + err[1])

Output:

Line  3 : My phone number is 731.215.8881. Line  seven : You tin reach Mr. Walters at (212) 558-3131. Line  12 : His agent, Mrs. Kennedy, can be reached at +12 (123) 456-7890 Line  xiv : She tin can also exist contacted at (888) 312.8403, extension 12.

Search a dictionary for words

The program beneath searches the dictionary for any words that starting time with h and end in pe. For input, it uses a dictionary file included on many Unix systems, /usr/share/dict/words.

import re filename = "/usr/share/dict/words" design = re.compile(r"\bh\w*pe$", re.IGNORECASE) with open(filename, "rt") as myfile:     for line in myfile:         if pattern.search(line) != None:             print(line, end='')

Output:

Promise heliotrope hope hornpipe horoscope hype

lecomptehathistordis90.blogspot.com

Source: https://www.computerhope.com/issues/ch001721.htm

0 Response to "How to Read Line in a Console Python"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel