Processing Steps

Processing steps are functions that run every single time you crawl a website/API. They help you bend and transform monitoring results into standardized data.

In order to build a Market Intelligence monitoring pipeline, Midesk offers many ways to processes and extract to exact results you want.

Processing Steps

A default view of the user interface with a processing step that extracts a title from a webpage.

There are in three very flexible methods that can help you: xPath and JSON. The use depends on the type of data you work with. If you work with HTML elements and websites, xPath is your friend. If you work with API or object structures, JSON method will help you tremendously.


Midesk offers multiple way how to process, mold and extract the exact content you'd like to. There are three key methods you can use:

xPath

Use case: Structured HTML or XML processing

Description: XPath (XML Path Language) is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values (e.g., strings, numbers, or Boolean values) from the content of an XML document. XPath was defined by the World Wide Web Consortium (W3C).

You can use the xPath method to not only process traditional websites but also XML files and RSS.

JSON

Use case: Structured API & Objects processing

Description: JSON is an open standard file format, and data interchange format, that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and array data types. It is a very common data format, with a diverse range of applications, such as serving as a replacement for XML in AJAX

Regex

Use case: Unstructured text processing

Description: A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that specifies a search pattern. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.

Transform

The notation of transform uses so-called "dot notation". A dot notation allows you to represent nested keys with a text notation. Each key is separated by a dot . sign.

Notation for simple elements
Input:
{
  "city": "Hamburg",
  "county": "Germany"
}
Expression:
Location: ${ city } (${ country })
Ouptut:
[
  "Location: Hamburg (Germany)"
]
Notation for chunked elements
Input:
[
  {
    "title": "Company",
    "text": "Strategy"
  },
  {
    "title": "Other company",
    "text": "Market Intelligence"
  }
]
Expression:
Location: ${ title } SOME TEXT ${ text }
Ouptut:
[
  "Location: Germany SOME TEXT Strategy",
  "Location: Other company SOME TEXT Market Intelligence"
]
Pipes

The transform method supports many pipes that help you work with the text. Pipes can be chained and perform operations from left to right. Pipes are denoted by a pipe | sign. Pipes may have one or two additional parameters which are separated by a colon : sign.

Basic pipe example
Input:
{
  "city": "Hamburg",
  "county": "Germany"
}
Expression:
Location: ${ city | upper } (${ country | upper })
Ouptut:
[
  "Location: HAMBURG (GERMANY)"
]

after

The after pipe returns everything after the given value in a string. The entire string will be returned if the value does not exist within the string:

//Input

{"text": "Some random text 123"}

// Processing step

${text | after : "Some random")

// Output:  text 123

append

The append pipe appends the given values to the string:

//Input

{"text": "Some random text 123"}

// Processing step

${text | after : "Some random")

// Output: text 123

afterLast

The afterLast pipe returns everything after the last occurrence of the given value in a string. The entire string will be returned if the value does not exist within the string:

//Input

{"text": "Some random - logo text - logo text"}

// Processing step

${text | afterLast : "- logo")

// Output: logo text

ascii

The ascii pipe will attempt to transliterate the string into an ASCII value:

before

The before pipe returns everything before the given value in a string:

//Input

{"text": "Some random - logo text - logo text"}

// Processing step

${text | before : " - ")

// Output: Some random

beforeLast

The beforeLast pipe returns everything before the last occurrence of the given value in a string:

//Input

{"text": "Some random - logo text - logo text"}

// Processing step

${text | before : " - ")

// Output: Some random - logo text

between

The between pipe returns the portion of a string between two values:

//Input

{"text": "Some random - logo text - logo text"}

// Processing step

${text | between : "Some" : "Text")

// Output:  random - logo text - logo

camel

The camel pipe converts the given string to camelCase

contains

The contains pipe determines if the given string contains the given value. This method is case sensitive:

endsWith

The endsWith pipe determines if the given string ends with the given value:

finish

The finish pipe adds a single instance of the given value to a string if it does not already end with that value:

is

The is pipe determines if a given string matches a given pattern. Asterisks may be used as wildcard values

isAscii

The isAscii pipe determines if a given string is an ASCII string:

isUuid

The isUuid pipe determines if the given string is a valid UUID:

kebab

The kebab pipe converts the given string to kebab-case:

length

The length pipe returns the length of the given string:

limit

The limit pipe truncates the given string to the specified length:

lower

The lower pipe converts the given string to lowercase:

ltrim

The ltrim pipe trims the left side of the string:

match

The match pipe will return the portion of a string that matches a given regular expression pattern:

//Input

{"text": "Some random - logo text - logo text"}

// Processing step

${text | between : "Some (.*)" : "Text")

// Output:  random - logo text - logo

Try this regular expression

matchAll

The matchAll pipe will return a collection containing the portions of a string that match a given regular expression pattern:

padBoth

The padBoth pipe adds padding both sides of a string with another string until the final string reaches the desired length:

padLeft

The padLeft method adds padding the left side of a string with another string until the final string reaches the desired length:

padRight

The padRight method adds padding the right side of a string with another string until the final string reaches the desired length:

plural

The plural pipe converts a singular word string to its plural form. This function currently only supports the English language:

//Input

{"text": "employee"}

// Processing step

${text | plural)

// Output: employees

remove

The remove pipe removes the given value or array of values from the string:

replaceFirst

The replaceFirst pipe replaces the first occurrence of a given value in a string:

replaceLast

The replaceLast pipe replaces the last occurrence of a given value in a string:

replaceMatches

The replaceMatches pipe replaces all portions of a string matching a pattern with the given replacement string:

rtrim

The rtrim pipe trims the right side of the given string:

singular

The singular pipe converts a string to its singular form. This function currently only supports the English language:

//Input

{"text": "employees"}

// Processing step

${text | singular)

// Output: employee

slug

The slug pipe generates a URL friendly "slug" from the given string:

snake

The snake pipe converts the given string to snake_case:

start

The start pipe adds a single instance of the given value to a string if it does not already start with that value:

startsWith

The startsWith pipe determines if the given string begins with the given value:

studly

The studly pipe converts the given string to StudlyCase:

substr

The substr pipe returns the portion of the string specified by the given start and length parameters:

substrCount

The substrCount pipe returns the number of occurrences of a given value in the given string

title

The title pipe converts the given string to Title Case:

//Input

{"text": "market intelligence"}

// Processing step

${text | title)

// Output: Market Intelligence

trim

The trim pipe trims the given string:

//Input

{"text": "      market intelligence      "}

// Processing step

${text | trim)

// Output: Market Intelligence

ucfirst

The ucfirst pipe returns the given string with the first character capitalized:

//Input

{"text": "market intelligence"}

// Processing step

${text | ucfirst)

// Output: Market intelligence

upper

The upper pipe converts the given string to uppercase:

//Input

{"text": "market intelligence"}

// Processing step

${text | title)

// Output: MARKET INTELLIGENCE

wordCount

The wordCount function returns the number of words that a string contains:


Pluck

The pluck method retrieves all of the values for a given key. It is superseded by the transform function as it offers much more versatile capabilities.


firstWhere

The firstWhere method returns the first element in the collection with the given key / value pair.


skipNth

The skipNth method returns a new collection that skips every nth member of items.


keyToNumber:

The keyToNumber method transforms all of the values for a given key into numbers.


sortBy

The sortBy method sorts the collection by the given key.


sortByDesc

This method has the same signature as the sortBy method, but will sort the collection in the opposite order.


chunk

The chunk method breaks the collection into multiple, smaller collections of a given size.


take

The take method returns a new collection with the specified number of items.


where

The where method filters the collection by a given key / value pair.


first

The first method returns the first element in the collection that passes a given expression test.


last

The last method returns the last element in the collection that passes a given expression test.


get

The get method returns the item at a given key.


count

The count method returns the total number of items in the collection.


keys

The keys method returns all of the collection's keys.


sum

The sum method returns the sum of all items in the collection with the given key.


max

The max method returns the maximum value of a given key.


median

The median method returns the middle value (the value separating the collection) of a given key.


min

The min method returns the minimum value of a given key.


mode

The mode method returns the value that appears most often of a given key.


unique

The unique method returns all of the unique items in the collection.


average

The avg method returns the average value of a given key.