Learn Python – Wikipedia Module in Python- Basic and advance

In this article, we will discuss the Wikipedia module in Python and additionally discuss how we can make use of the Wikipedia module the use of the Python script. We will fetch the verity of facts from the Wikipedia.

Introduction

Internet is the most substantial source of information. All expertise is simply one click away from us if we have an net connection. Therefore, it is integral to comprehend how we can acquire right facts from the right source. When we retrieve the records form more than a few sources, this time period is referred to as Data Scraping. We all have used Wikipedia. It is the land of the full of information.

Wikipedia is the largest platform on the internet, which carries lots of information. It is an open-source platform which manages by way of the community of volunteer editors the use of a wiki-based enhancing system. It is a multi-lingual encyclopedia.

Python offers the Wikipedia module (or API) to scrap the facts from the Wikipedia pages. This module permits us to get and parse the data from Wikipedia. In easy words, we can say that it is worked as a little scrapper and can scrap solely a restrained amount of data. Before we start working with it, we need to deploy this module on our nearby machine.

Installation

This module wraps the legit Wikipedia API. In the first step, we will install the Wikipedia module the use of the following pip command. Type the below command in the terminal-

$pip install wikipedia  

The above command will set up the module in the system. Now, we need to import it the usage of the following command.

import wikipedia  

Now we are prepared to extract records from the Wikipedia.

Getting Started with Wikipedia Module

Wikipedia module consists of a number of built-in methods which assist to get the desired information.

Search Title and Result

The Python Wikipedia module permits us to search a question furnished as an argument the usage of the search() method. This method returns a list of all articles that include the searched query. Let’s recognize the following example.

Example –

import wikipedia  
# Seaching a title  
print(wikipedia.search("India"))  

Output:

['India', 'Constitution of India', 'Demographics of India', 'Languages of India', 'Republic Day (India)', 'Government of India', 'Economy of India', 'History of India', 'The Times of India', 'List of prime ministers of India']

As we can see in the above output, the method back the title and the associated search. We can limit the range of search titles by means of passing a price for the end result parameter. Consider the following example.

Example –

import wikipedia  
# Seaching a title  
print(wikipedia.search("India", results = 4))  

Output:

['India', 'Constitution of India', 'Demographics of India', 'Languages of India']

The above code printed the four consequences due to the fact have made request to get only four results.

Suggestion

As the name suggests, the propose approach returns the recommended Wikipedia title for the query or none if it doesn’t observed any. Let’s see the following example.

Example –

import wikipedia  
  
print(wikipedia.suggest("Coronavrdsf"))  

Output:

None

In the above code, we have searched for the “Coronavirus” but type the wrong spelling. The suggest() technique back None, due to the fact it did not discover the searched query.

Summary of the Article

Python Wikipedia module offers the summary() method, which returns the article’s precis or topic. This method takes the two arguments – title and sentences and returns the precis in the string format. Let’s reflect onconsideration on the below example.

Example –

import wikipedia  
  
print(wikipedia.summary("Rohit Sharma", sentences=4))  

Output:

Rohit Gurunath Sharma (born 30 April 1987) is an Indian international cricketer who plays for Mumbai in domestic cricket and captains Mumbai Indians in the Indian Premier League as a right-handed batsman and an occasional right-arm off break bowler. He is the vice-captain of the Indian national team in limited-overs formats.
Outside cricket, Sharma is an active supporter of animal welfare campaigns. He is the official Rhino Ambassador for WWF-India and is a member of People for the Ethical Treatment of Animals (PETA).

The precis of the supply title printed and we personalized the variety of sentences in the precis text to be displayed by means of the usage of the sentences argument.

It will be usually remembered the summary() method raises a “disambiguation error” if the web page does not exist. Let’s apprehend the following example.

Example –

print(wikipedia.summary("key"))  

Output:

Traceback (most recent call last):
  File "C:/Users/DEVANSH SHARMA/PycharmProjects/MyPythonProject/pillow_image.py", line 194, in 
    print(wikipedia.summary("key"))
  File "C:\Users\DEVANSH SHARMA\PycharmProjects\MyPythonProject\venv\lib\site-packages\wikipedia\util.py", line 28, in __call__
    ret = self._cache[key] = self.fn(*args, **kwargs)
  File "C:\Users\DEVANSH SHARMA\PycharmProjects\MyPythonProject\venv\lib\site-packages\wikipedia\wikipedia.py", line 231, in summary
    page_info = page(title, auto_suggest=auto_suggest, redirect=redirect)
  File "C:\Users\DEVANSH SHARMA\PycharmProjects\MyPythonProject\venv\lib\site-packages\wikipedia\wikipedia.py", line 276, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "C:\Users\DEVANSH SHARMA\PycharmProjects\MyPythonProject\venv\lib\site-packages\wikipedia\wikipedia.py", line 299, in __init__
    self.__load(redirect=redirect, preload=preload)raise DisambiguationError(getattr(self, 'title', page['title']), may_refer_to)
wikipedia.exceptions.DisambiguationError: "Key" may refer to: 
Key (cryptography)
Key (lock)
Key (map)
typewriter
test
Cay
Key, Alabama
Key, Ohio
Key, West Virginia
Keys, Oklahoma
Florida Keys

Extracting Metadata of Title

We can get the whole metadata or textual content content material of the Wikipedia page except images, table, etc. This module gives the content material attribute of the page object. Let’s see the following example.

Example –

import wikipedia  
  
print(wikipedia.page("Sachin Tendulkar").content)  

Output:

Sachin Ramesh Tendulkar ( (listen); born 24 April 1973) is an Indian former international cricketer who served as captain of the Indian national team. He is widely regarded as one of the greatest batsmen in the history of cricket. He is the highest run scorer of all time in International cricket. Considered as the world's most prolific batsman of all time, he is the only player to have scored one hundred international centuries, the first batsman to score a double century in a One Day International (ODI), the holder of the record for the most runs in both Test and ODI cricket, and the only player to complete more than 30,000 runs in international cricket. In 2013, he was the only Indian cricketer included in an all-time Test World XI named to mark the 150th anniversary of Wisden Cricketers' Almanac.
............

Getting Full Wikipedia Page Data

Python Wikipedia module allows us to get the full Wikipedia using the page() function. It returns the page content, categories, coordinate, images, links and different metadata. Let’s recognize the following example.

Example –

import wikipedia  
  
# wikipedia page object is created  
object = wikipedia.page("America")  
  
# printing html of page_object  
print(object.html)  
  
# printing title  
print(object.original_title)  
  
# printing links on that page object  
print(object.links[0:20])  

Output:

>
United States
['.as', '.com', '.edu', '.gov', '.gu', '.mil', '.mp', '.net', '.org', '.pr', '.um', '.us', '.vi', '100th meridian west', '117th United States Congress', '1790 United States Census', '1800 United States Census', '1810 United States Census', '1820 United States Census', '1830 United States Census']

Customizing the Page Language

We can exchange the default language of the existed page. The set_lang() method is used to change the web page language. Each language has a popular prefix code which is surpassed as an argument in the method. Let’s recognize the following example.

Example –

import wikipedia  
wikipedia.set_lang("hi")  
  
print(wikipedia.summary("Python"))  

Output:

????? ?? ??????? ??????? ?? ??? ???????, ???? ?????? ???????????? ???? (General Purpose and High Level Programming language), ???????????, ???????? ?????????, ???????????? ???? ??? ?? ???? ?? ?? ??? ?? ?????? ???? ??? ?? ???? ????? ???? ?? ??? ????? ?? ???? ?? ???? ?? ?????
???? ???????????? ?????? ?? ??????, ?????? ???-??????? ?? ??????? ?? ??? ????? ?????? ( {} ) ?? ???????? ???? ???? ??, ????? ??? ???-??????? ?? ??????? ?? ??? ?????? ????? (white space) ?? ?????? ???? ???? ??? ?? ???????????? ???? ?? Guido van Rossum ?? 1991 ??? ????? ??? ?? ??????? ?? ???????????? ???? ?? ?????? ????????? ????? ?? ??? ??? ?? ??????, ???? ?????-??????? ???? ?? ????? ???? ??? ????? "????? ???? ?? ??? ???? ?????? ????????? ?????" ?? ???? ???? ??? ?? ???? ???? ????????? (standard library) ???? ?? ?????? ???
?? ???? ?? ??????-????? ??? ???-??????? (code readability) ?? ??? ???? ??? ??? ????? ?? ???? ?? ?? ???? ????????? ???? ?????? ??; ???? ???? ????????? ????? ?? ?????????? (comprehensive) ??? ?? ?????? ???????? ?? ??? ????? ?????? ????? ??? (pre-installed) ??? ???
???? ?????? ?????? ?? ???, ????? ????? ?? ???????????? ???? ?? ??? ??? ?????? ???? ???? ??, ????? ??? ??? ??? ???????????? ???????? ?? ?? ??????? ?????? ??? ?? ?????? ???? ???? ??? ??? ??????? ?? ????? ????, ????? ??? ?????????? ???????? ????? ????????? (???????????? ?????????) ?? ??? ??? ??? ???? ?? ???? ??? ????? ??????????? ?? ???????? ?????? ?? ??? ?????? ????

As we can see in the above code, it converted the request web page in the Hindi. We can exchange any of the language using the set_lang() method.

Conclusion

We have covered all important standards of the Wikipedia API using the Python code. We have additionally mentioned how to get the range of the information such as web page title, summary, category and extract the statistics from the web.