CodVerter | How to Extract IPv4 and IPv6 IP Addresses from Plain Text Using Regex

There are many scenarios where IP addresses need to be extracted from plain text. For example: a backbone router configuration output, vast outputs of security log messages from syslog servers, routing tables and many more...

In this article we will learn what are the most common different patterns for IPs, that are often required to be extracted and isolate from plain text using regular expressions.

These examples are taken from CodVerter's IP Extractor source code tool for handling IP addresses extraction and detection when necessary.

Most Common IP Patterns to Scan and Identify

IPv4 Host Address


                    192.168.10.50 

                    111.28.14.1 

                    8.8.8.8

Sample of IPv4 Host Address

In this scenario we got a pattern that contains 4 octets separated by a dot mark, where each octet is between 1 to 3 digits. A regular expression pattern that represents IPv4 Host Address is:

[0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}

We can make this regex pattern shorter by using non-capturing group for the first 3 octets and another {3}[0-9]{1,3} for the last octet that doesn't end with dot. Here is a more elegant version of the regex pattern above:

(?:[0-9]{1,3}[.]){3}[0-9]{1,3}

Validating IPv4 Address

In many cases we can assume that each address in our source data is already a valid one. By "valid address" we mean all octet numbers are equal or less than 255. But what if we also wish to validate the IPs in the data?
We'll need to formulate a regex pattern that restricts min and max value of every octet, along with IPv4 detection:

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

IPv4 Address with Prefix

IP address can be with a prefix / “slash notation” that describes how many bits are contained in the network.


                    77.54.11.1/32 

                    192.168.0.1/24 

                    172.15.0.0/16

Sample of IPv4 Address with Prefix

This scenario is very similar to the first one, but with a little extra sub-pattern to identify the slash notation. An IPv4 prefix will always end with a slash followed by 1 or 2 digits as seen here:

Prefix	Subnet Mask	Wildcard Mask
/1	128.0.0.0	0.0.0.128
/2	192.0.0.0	0.0.0.192
/3	224.0.0.0	0.0.0.224
/4	240.0.0.0	0.0.0.240
/5	248.0.0.0	0.0.0.248
/6	252.0.0.0	0.0.0.252
/7	254.0.0.0	0.0.0.254
/8	255.0.0.0	0.0.0.255
/9	255.128.0.0	0.0.128.255
/10	255.192.0.0	0.0.192.255
/11	255.224.0.0	0.0.224.255
/12	255.240.0.0	0.0.240.255
/13	255.248.0.0	0.0.248.255
/14	255.252.0.0	0.0.252.255
/15	255.254.0.0	0.0.254.255
/16	255.255.0.0	0.0.255.255
/17	255.255.128.0	0.128.255.255
/18	255.255.192.0	0.192.255.255
/19	255.255.224.0	0.224.255.255
/20	255.255.240.0	0.240.255.255
/21	255.255.248.0	0.248.255.255
/22	255.255.252.0	0.252.255.255
/23	255.255.254.0	0.254.255.255
/24	255.255.255.0	0.255.255.255
/25	255.255.255.128	128.255.255.255
/26	255.255.255.192	192.255.255.255
/27	255.255.255.224	224.255.255.255
/28	255.255.255.240	240.255.255.255
/29	255.255.255.248	248.255.255.255
/30	255.255.255.252	252.255.255.255
/31	255.255.255.254	254.255.255.255
/32	255.255.255.255	255.255.255.255

IP Prefix Table

We will add /[0-9]{1,2} to the previous regular expression and get the following:

(?:[0-9]{1,3}[.]){3}[0-9]{1,3}/[0-9]{1,2}

Validating IPv4 Address with Prefix

Just as we did before, we can add a validation test with additional sub-pattern to identify the slash notation:

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/[0-9]{1,2})\b

Subnet Mask or Wildcard Mask Address

A subnet mask is a 32-bit number that separates the IP address to network and to host addresses. Pattern of subnet mask include 4 octets. However, each octet can only be represented with one of these values: 255,254,252,248,240,224,192,128,0.

A Wildcard mask is a mirror image of a subnet mask pattern, and is used in various cases such as declaring an address' range in a Cisco router access list.

Subnet Mask	Wildcard Mask
255.255.255.0	0.255.255.255
255.0.0.0	0.0.0.255
128.0.0.0	0.0.0.128

Samples of subnet masks and their parallel wild card addresses

A regular expression pattern that will represent the required address is:

(255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)

Just like we did with the IPv4 address above, we can formulate the expression in a more elegant and shorter way by using negative lookahead :

(?:(255|254|252|248|240|224|192|128|0)[.]){3}(255|254|252|248|240|224|192|128|0)

Filtering out only subnet mask / wild card mask addresses

There are some cases in real life we'll want to filter out only the subnet mask / wild card mask from the other IP addresses in our source data. We can use capture group with both of the previous regex patterns to get only subnet mask:

((?!(255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)))(?:[0-9]{1,3}[.]){3}[0-9]{1,3}

The subnet mask pattern in the beginning of the regex wrapped with (?<!...) is ignored, and we will get only IPs that are not subnet masks.

Here is an example from CodVerter's Multiple Regex Pattern Scanner :

Sample of filtering out subnet mask addresses using regex

Note, if you intend to use this pattern in a web project using JavaScript you should know that currently, to the time of writing this article, Firefox browser is not supporting capture groups.

IPv6 Address

Ipv6 is a 128 bit address, represented as eight groups of four hexadecimal digits. Compared to IPv4 that uses only 32 bit represented binarily, it has approximately 3.4×10^38 addresses and was designed for the need of an IP pool that probably will never end. The regular expression for this will be slightly cumbersome because there are many sub-patterns options.


                    2001:0002:6c::430 

                    ff01:0:0:0:0:0:0:2 

                    ::1

                    2001:0000:3238:DFE1:63:0000:0000:FEFB

                    2001:cdba::3257:9652

                    2001:cdba:0000:0000:0000:0000:3257:9652

Sample of IPv6 Addresses

Regular expression that represents all hexadecimal sub patterns with multiple regex OR conditions :

\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:)))(%.+)?\s*

IPv6 Address with Prefix

Similar to IPv4 addresses, IPv6 can also include prefix / “slash notation” that describes how many bits a contained in the network. However, unlike IPv4, IPv6 we can have slash notation followed by 3 digits. We will add /[0-9]{1,3} to the previous regular expression and get the full pattern:

Samples

Here you can see how CodVerter's IP Extractor handles configuration output from backbone Juniper router and extract all IP addresses with their number of occurrences:

Sample of configuration output from backbone Juniper router

Using CodVerter's IP Extractor to mark IP addresses by segments from Cisco router ACL configuration:

Sample of marking different IP address patterns from Cisco router ACL configuration

We hope you find this useful. Feel free to contact us and to join CodVerter's community.

Let's CodVert!

Author: Jonathan @CodVerterTeam

Date: 5 January 2019