How to Read Data Set From Api in R

Learning Objectives

Afterward completing this tutorial, you will exist able to:

  • Access data from the Colorado information warehouse RESTful API.
  • Describe and recognize query parameters in a RESTful call.
  • Ascertain response and request relative to data API data access.
  • Define API endpoint in the context of the SODA API.
  • Exist able to list the ii potential responses that you may get when querying a RESTful API.
  • Use the mutate_at() function with dplyr pipes to adjust the format / data type of multiple columns.

What You Need

Y'all volition need a computer with internet access to complete this lesson.

In the previous lessons, you learned how to access human readable text files data programmatically using:

  1. download.file() to download a file to your computer and work with information technology (ideal if yous want to save a copy of the data to your computer)
  2. read.csv() ideal for reading in a tabular file stored on the spider web but may sometimes fail when there are secure connections involved (e.g. https).
  3. fromJSON() platonic for information accessed in JSON format.

In this lesson, you will learn about API interfaces. An API allows united states to access information stored on a estimator or server using a specific query. APIs are powerful means to admission data and more specifically the specific type and subset of data that you demand for your analysis, programmatically.

Yous will likewise explore the machine readable JSON data structure. Automobile readable information structures are more efficient - particularly for larger information that comprise hierarchical structures. In this lesson, you will use the getJSON() function from the rjson package to import information from an API, provided in .json format into a data.frame.

                          #NOTE: if you have problems with ggmap, try to install both ggplot and ggmap from github                                          #devtools::install_github("dkahle/ggmap")                                          #devtools::install_github("hadley/ggplot2")                                          library              (              ggmap              )                                          library              (              ggplot2              )                                          library              (              dplyr              )                                          library              (              rjson              )                                          library              (              jsonlite              )                                          library              (              RCurl              )                                                  

Residue API Review

Call up that in the kickoff lesson in this module, yous learned about RESTful APIs. Yous explored the concept of a asking and so a subsequent response. The request to an Residuumful API is composed of a URL and the associated parameters required to admission a particular subset of the information that y'all wish to access.

When you send the request, the web API returns one of the post-obit:

  1. The data that you requested or
  2. A failed to return message which tells us that something was wrong with your request.

In this lesson you will access information stored in JSON format from a RESTful API.

Colorado Population Projection data

The Colorado Information Marketplace is a comprehensive information warehouse that contains a broad range of Colorado-specific open datasets bachelor via a RESTful API called the Socrata Open Data API (SODA).

API Endpoints

At that place are lots of API endpoints or information sets bachelor via this API. An endpoint refers to a dataset that you tin access and query confronting.

The "endpoint" of a (SODA) API is simply a unique URL that represents an object or collection of objects. Every Socrata dataset, and even every individual data record, has its ain endpoint. The endpoint is what you'll point your HTTP customer at to interact with data resources. Read more about endpoints

One endpoint on the CO information warehouse website contains Colorado Population Projections. If you click on the Colorado Population Projections data link (JSON format) you will see data returned in a JSON format. These data include population estimates for males and females for every county in Colorado for every year from 1990 to 2040 for multiple age groups.

URL Parameters

Using URL parameters, you tin can define a more specific asking to limit what data you get dorsum in response to your API asking. For example, if you only want information for Boulder, Colorado, you can query but that subset of the data using the RESTful call. In the link below, note that the ?&county=Boulder part of the url makes the asking to the API to only render data that are for Boulder Canton, Colorado.

Like this: https://data.colorado.gov/resource/tv8u-hswn.json?&county=Boulder.

Parameters associated with accessing data using this API are documented hither.

Using the Colorado SODA API

The Colorado SODA API allows us to write 'queries' that filter out the exact subset of the information that you desire. Here'southward the API URL for population projections for females who live in Boulder that are age 20-40 for the years 2016-2025:

            https://information.colorado.gov/resources/tv8u-hswn.json?$where=historic period between 20 and 40 and year between 2016 and 2025&county=Boulder&$select=twelvemonth,historic period,femalepopulation                      

Click hither to view data. (JSON format).

API Response

The information that are returned from an API request are called the response. The format of the returned information or the response is about often in the form of plain text 'file' such as JSON or .csv.

Data Tip: Many API's permit u.s.a. to specify the format of the data that you desire returned in the response. The Colorado SODA API is no exception - check out the documentation.

Accessing API Information

The first thing that you need to exercise is create your API request cord. Recall that this is a URL with parameters parameters that specify which subset of the data that you want to access.

Note that you are using a new function - paste0() - to paste together a complex URL string. This is useful because you may want to iterate over different subsets of the same data (ie reuse the base of operations url or the endpoint just asking unlike subsets using different URL parameters).

                          # Base URL path                                          base_url                                          =                                          "https://information.colorado.gov/resource/tv8u-hswn.json?"                                          full_url                                          =                                          paste0              (              base_url              ,                                          "county=Boulder"              ,                                          "&$where=age between 20 and xl"              ,                                          "&$select=year,age,femalepopulation"              )                                          # view full url                                          full_url                                          ## [one] "https://data.colorado.gov/resources/tv8u-hswn.json?county=Bedrock&$where=age between twenty and 40&$select=year,age,femalepopulation"                                                  

Afterward you've created the URL, you can get the data. There are a few means to access the data however the most directly manner is to

  1. Use encodeURL() to replace spaces in your url with the asii value for space %20
  2. Utilise the fromJSON() function in the rjson package to import that information into a data.frame object.

Let's give it a try. Outset, you encode the URL to replace all spaces with the ascii value for a space which is %20.

                          # encode the URL with characters for each space.                                          full_url                                          <-                                          URLencode              (              full_url              )                                          full_url                                          ## [1] "https://data.colorado.gov/resource/tv8u-hswn.json?county=Boulder&$where=age%20between%2020%20and%2040&$select=year,age,femalepopulation"                                                  

Then, you lot import the data directly into a data.frame using the fromJSON() function that is in the rjson package.

                          library              (              rjson              )                                          # Convert JSON to data frame                                          pop_proj_data_df                                          <-                                          fromJSON              (              getURL              (              full_url              ))                                          caput              (              pop_proj_data_df              ,                                          north                                          =                                          two              )                                          ##   twelvemonth age femalepopulation                                          ## 1 1990  20             2751                                          ## ii 1990  21             2615                                          typeof              (              pop_proj_data_df              )                                          ## [i] "listing"                                                  

Data Tip: The getForm() is some other mode to admission API driven data. You lot are not going to larn this in this class even so it is a good selection that results in code that is a chip cleaner given the various parameters are passed to the function via argument like syntax.

                              base_url_example                                                <-                                                "https://data.colorado.gov/resources/tv8u-hswn.json?"                                                getForm                (                base_url                ,                                                county                                                =                                                "Bedrock"                ,                                                age                =                "Bedrock"                )                                                          

Too note that if you wanted to apply getURL(), you could practise and then as follows:

                              # get the data from the specified url using RCurl                                                pop_proj_data_example                                                <-                                                getURL                (                URLencode                (                full_url                ))                                                          

Now that your data are in a data.frame format, you tin can clean them up. Let's have a close await at the data structure. Are the values in the correct format to work with them quantitatively?

                          # view data structure                                          str              (              pop_proj_data_df              )                                          ## 'information.frame':	1000 obs. of  3 variables:                                          ##  $ year            : chr  "1990" "1990" "1990" "1990" ...                                          ##  $ age             : chr  "xx" "21" "22" "23" ...                                          ##  $ femalepopulation: chr  "2751" "2615" "2167" "1798" ...                                                  

When you import the data from JSON, by default they import in cord format. Withal, if you want to plot the data and manipulate the data quantitatively, you need your data to exist in a numeric format. Let'southward fix that next.

mutate_at from dplyr

You tin uset the mutate_at() part in a dplyr pipe to change the format of (or apply any function on) whatsoever columns within your data.frame. In this instance you desire to catechumen all of the columns to a numeric format.

To use mutate_at() yous specify the cavalcade names that you want to catechumen in a vector followed by the function that you lot wish to apply to each cavalcade. THe function in this case is as.numeric().

Because you are using this function in a pipe, your code looks similar this:

                          # turn columns to numeric and remove NA values                                          pop_proj_data_df                                          <-                                          pop_proj_data_df                                          %>%                                          mutate_at              (              c              (                                          "age"              ,                                          "twelvemonth"              ,                                          "femalepopulation"              ),                                          as.numeric              )                                          str              (              pop_proj_data_df              )                                          ## 'data.frame':	1000 obs. of  three variables:                                          ##  $ year            : num  1990 1990 1990 1990 1990 1990 1990 1990 1990 1990 ...                                          ##  $ age             : num  xx 21 22 23 24 25 26 27 28 29 ...                                          ##  $ femalepopulation: num  2751 2615 2167 1798 1692 ...                                                  

Data Tip: Notation that the code below, is much more than VERBOSE version of what you did above, in a clean way using mutate_at(). dplyr is a much more efficient way to catechumen the format of several columns of information!

                              # convert EACH row to a numeric format                                                # note this is the clunky way to do what you did above with dplyr!                                                pop_proj_data_df                $                age                                                <-                                                every bit.numeric                (                pop_proj_data_df                $                age                )                                                pop_proj_data_df                $                year                                                <-                                                as.numeric                (                pop_proj_data_df                $                year                )                                                pop_proj_data_df                $                femalepopulation                                                <-                                                as.numeric                (                pop_proj_data_df                $                femalepopulation                )                                                # OR use the apply function to convert all rows in the information.frame to numbers                                                #pops <- every bit.data.frame(lapply(pop_proj_data_df, every bit.numeric))                                                          

Once yous have converted your data to a numeric format, y'all tin plot it using ggplot().

                          # plot the information                                          ggplot              (              pop_proj_data_df              ,                                          aes              (              x                                          =                                          twelvemonth              ,                                          y                                          =                                          femalepopulation              ,                                          group                                          =                                          gene              (              age              ),                                          color                                          =                                          age              ))                                          +                                          geom_line              ()                                          +                                          labs              (              x                                          =                                          "Year"              ,                                          y                                          =                                          "Female Population - Age 20-40"              ,                                          championship                                          =                                          "Projected Female Population"              ,                                          subtitle                                          =                                          "Boulder, CO: 1990 - 2040"              )                                                  

Female population age 20-40.

Optional Challenge

Using the population projection data that you simply used, create a plot of projected MALE population numbers as follows:

  • Time span: 1990-2040
  • Column category: malepopulation
  • Age range: 60-80 years old

Use ggplot() to create your plot and be sure to characterization x and y axes and requite the plot a descriptive title.

Example Homework Plot

Male population ages 60-80.

lecomptehathistordis90.blogspot.com

Source: https://www.earthdatascience.org/courses/earth-analytics/get-data-using-apis/API-data-access-r/

0 Response to "How to Read Data Set From Api in R"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel