Power BI Using List.Generate to Reduce API Calls with Pagination

Intro

In this post, I am going to use the Power Query function List.Generate to reduce API calls.
I would build on the approach used (for example) here, and suggest an improvement which we implemented in one of our client solutions. The main benefits are a shorter query execution time and a reduced likelihood to hit the throttling limit when calling a web service.

The main idea is to replace the process of looping through the pages a hardcoded number of times, with a more clever one which stops on the first empty table, signified by an error.

 

The Scenario

The Power BI client solution uses Power Query to retrieve a changing number of data records from a web service on a regular basis. Each record includes timesheets related data and has the following fields: [Worker, Date, Work Hours].

In order to retrieve the data, we invoke a series of Web.Contents function calls. The “options” argument of each Web.Contents call is a string which includes a date interval and a page number.

The date interval is pre-calculated using two queries whose content is irrelevant to this post:

LG (1)

The page number is required since the web service returns a table whose records correspond to an ordered list of pages (In practice Web.Contents returns a binary file from which we need to extract the table), like this:

LG (2+3)

We do not know what is the last relevant page number (the number of the last page which contains data).

Passing a page number which exceeds the last relevant page number to the Web.Contents function results in an empty table, but I encountered other cases (with other web services) in which I got an error or some boolean indication.

 

How It was Done Before

  1. Creating a custom function, “fnGetTimeSheetsData”-

LG (4)

fnGetTimeSheetsData passes its arguments to Web.Contents and performs some necessary drilling and cleaning operations which are specific to the client solution and are irrelevant to this post.

One point of interest is that passing a page numbers which exceed the last relevant page number to the fnGetTimeSheetsData function results in an error (due to the empty table).

  1. Creating a list of 100 page numbers (1-100), and converting it to a table.
  2. Adding a custom column which invokes the fnGetTimeSheetsData function, for the dates and the current page number.

LG (5)

  1. Removing records which contain an error (where we exceeded the last relevant page number).
  2. Removing the page number column and expanding the custom column.

 

The New Trick !

  1. Creating the custom function fnGetTimeSheetsData.
  2. Building a dynamic list of tables by testing the result of the fnGetTimeSheetsData function for an error.

LG (6)

That’s it !

We just reduced the number of API call from 100 to 6 (even unsuccessful calls count for the throttling limit).

We start with a dummy row:

()=>[Page Number=0,Func=null]

This simplifies the code and prevents the need to address the situation of an empty list in the following query steps.

Each successive element in the list is a record with two fields:

each [Page Number=_[Page Number]+1,Func=fnGetTimeSheetsData([Page Number]+1,KnownStartDate,KnownEndDate)]

“Page Number” is the successive page number and “Func” is the result of a call to the fnGetTimeSheetsData function for said successive page number.

The process stops once we get the first error from the fnGetTimeSheetsData function (the last iteration is excluded from the resulting list):

each (try _[Func])[HasError]=false

This condition can be adjusted to handle other indications for exceeding the last relevant page number.

The resulting list consists of only the “Func” field:

each _[Func]
  1. Converting the list to a table.
  2. Removing the dummy row and expanding the column.

 

The full code:

let
    AddFunctionCall=List.Generate(()=>[Page Number=0,Func=null], each (try _[Func])[HasError]=false, each [Page Number=_[Page Number]+1, Func=fnGetTimeSheetsData([Page Number]+1, KnownStartDate,KnownEndDate)], each _[Func]),
    ConvertToTable = Table.FromList(AddFunctionCall, Splitter.SplitByNothing(), null, null),
    RemoveDemiRow = Table.Skip(ConvertToTable,1),
    ExpandColumn1 = Table.ExpandTableColumn(RemoveDemiRow, "Column1", {"Worker", "Date", "Work Hours"})
in
    ExpandColumn1

 

Conclusion

  1. This approach dramatically reduces the number of API calls.
  2. List.Generate is a powerful function, so get creative !

Comments