A Go(Golang) package for extracting, parsing and manipulating URLs.
- Flexible URL extraction from text using regular expressions.
- Domain parsing into subdomains, root domains, and TLDs.
- Extends the standard
net/url
URLs parsing with additional fields.
go get -v -u github.com/hueristiq/hqgourl
package main
import (
"fmt"
"github.com/hueristiq/hqgourl"
"regexp"
)
func main() {
extractor := hqgourl.NewURLExtractor()
text := "Check out this website: https://example.com and send an email to info@example.com."
regex := extractor.CompileRegex()
matches := regex.FindAllString(text, -1)
fmt.Println("Found URLs:", matches)
}
The URLExtractor
allows customization of the URL extraction process through various options. For instance, you can specify whether to include URL schemes and hosts in the extraction and provide custom regex patterns for these components.
-
Extracting URLs with Specific Schemes
extractor := hqgourl.NewURLExtractor( hqgourl.URLExtractorWithSchemePattern(`(?:https?|ftp)://`), )
This configuration will extract only URLs starting with http, https, or ftp schemes.
-
Extracting URLs with Custom Host Patterns
extractor := hqgourl.NewURLExtractor( hqgourl.URLExtractorWithHostPattern(`(?:www\.)?example\.com`), )
This setup will extract URLs that have hosts matching www.example.com or example.com.
Note
Since API is centered around regexp.Regexp, many other methods are available
package main
import (
"fmt"
"github.com/hueristiq/hqgourl"
)
func main() {
dp := hqgourl.NewDomainParser()
parsedDomain := dp.Parse("subdomain.example.com")
fmt.Printf("Subdomain: %s, Root Domain: %s, TLD: %s\n", parsedDomain.Sub, parsedDomain.Root, parsedDomain.TopLevel)
}
package main
import (
"fmt"
"github.com/hueristiq/hqgourl"
)
func main() {
up := hqgourl.NewURLParser()
parsedURL, err := up.Parse("https://subdomain.example.com:8080/path/file.txt")
if err != nil {
fmt.Println("Error parsing URL:", err)
return
}
fmt.Printf("Subdomain: %s\n", parsedURL.Domain.Sub)
fmt.Printf("Root Domain: %s\n", parsedURL.Domain.Root)
fmt.Printf("TLD: %s\n", parsedURL.Domain.TopLevel)
fmt.Printf("Port: %d\n", parsedURL.Port)
fmt.Printf("File Extension: %s\n", parsedURL.Extension)
}
Set a default scheme:
up := hqgourl.NewURLParser(hqgourl.URLParserWithDefaultScheme("https"))
Issues and Pull Requests are welcome! Check out the contribution guidelines.
This utility is distributed under the MIT license.
Thanks to the amazing contributors for keeping this project alive.
Thanks to similar open source projects - check them out, may fit in your needs.
DomainParser ◇ urlx ◇ xurls ◇ goware's tldomains ◇ jakewarren's tldomains