0
Answered

How do I substring?

anonymous 10 years ago in Studio / Toolbox updated by Gandalf 9 years ago 5
I have a data payload that I need to perform sub-strings on.
I can get the starting and ending indexes, but have no idea how to use these to create a sub-string.
I have checked the calculate functions and tried using the DataSplit tool.
The best I have managed to do is use the DataSplit tool to get everything after the start index, but I cannot seem to get it to stop at the end index without using a second DataSplit tool.
Its frustrating to use 4 tools to perform such a simple function. ( 2 Find Index and 2 DataSplit tools )
I will not create a workflow to do this as I need to use First and Last index in the Find Index tool depending on which string I am searching for. 

Can you please advise how I can do with with 3 or less tools. I understand the need to use 2 Find Index tools. I would like to know how to use a single tool to sub-string?

Answer

PINNED
Hi Trav

Using the last index in the Data Split tool wont work in this way. This is because the index is relative to the last split in the Data Split tool. It makes working with fixed width files absolutely awesome, but not so great for what you are trying to do.
Dont worry, there is an easy solution:

1. Using data split process direction forward, get from the first tag to the end.
2. Using data split process direction backward, get from the last tag to the beginning

This will leave you with the inner content of the outer tags:


2 steps not 3 ;)
Under review
Hi Travis

The tool is designed to do a lot more than substring. Instead of searching for the index then using the index, why don't you try splitting on that criteria (CHAR) and you can do everything in one step?
Please post the example here if this doesn't make sense.
I am looking to page scrape.
This means I am unable to perform all my string manipulation in a single operation.
In fact I need to iteratively split my content string into smaller pieces based on a variety of tags. The start and end tag do not always match and there can be multiples of them in the data I am working with.

My Algo looks something like this... 

1) Given payload X, I need to locate a region in-between tags Q and V called payload Y.
2) Given payload Y, I need to locate a sub-region in-between tags R and S called payload Z.
3) Given payload Z, I need to search 8 different tag sets. 
4) Given payload Z', I may need to search a further 2 levels of tags down depending on what data region it is.
5) Given a final payload A, I need to clean the data up removing bits and pieces that need to be scrubbed out. ( There are 8 of these final pieces.)

Give my algo's need to work down through a varying level of tags set, I believe I need a sub-string feature. Hence why I attempted to use 2 Find Index tools to locate the correct tags. 

The php code I wrote to achieve this is below. I used it in a loop that iterates over the data and tag sets building up what I need and pushes it into a MySQL database because of their upsert syntax.Potential duplication is very possible in the data I am working with and I needed to weed this out easily. 

function ExtractRegion($stringData, $startTag, $endTag, $isLast, &$endPos = null){
$start = 0;

if($isLast){
$start = strripos($stringData, $startTag, 0);
}else{
if($startTag == ""){
$start = 0; // trigger a extract from start of string
}else{
$start = stripos($stringData, $startTag, 0);
}
}

if($start != ""){
$end = stripos($stringData, $endTag, $start);
if($end != ""){
$len = ($end - $start);

$tmp = substr($stringData, $start, $len);

if($endPos !== null){
$endPos = $end;
}

return $tmp;
}
}

return "";
}

Thanks.
Hi 

If you are looking for a simple sub-string, then you could use the calculate tool using the mid function. A simple example of this is mid([[a]] , 1 ,5). The calculate tool has most most of the functions provided by excel, and as such, you could also use Left() and Right() depending on your particular use case.


PINNED
Hi Trav

Using the last index in the Data Split tool wont work in this way. This is because the index is relative to the last split in the Data Split tool. It makes working with fixed width files absolutely awesome, but not so great for what you are trying to do.
Dont worry, there is an easy solution:

1. Using data split process direction forward, get from the first tag to the end.
2. Using data split process direction backward, get from the last tag to the beginning

This will leave you with the inner content of the outer tags:


2 steps not 3 ;)