Skip to content

CLI Utility

Wes edited this page Apr 25, 2018 · 7 revisions

Table of Contents

CLI Utility

Invoke the CLI utility with the following command:

 docker-compose exec cli streamingphish

Users should immediately be presented with the main menu:

wes@phishtest-4:~$ sudo docker-compose exec cli streamingphish                

   _____ __                            _
  / ___// /_________  ____ _____ ___  (_)
  \__ \/ __/ ___/ _ \/ __ `/ __ `__ \/ / __ \/ __ `/
 ___/ / /_/ /  /  __/ /_/ / / / / / / / / / / /_/ /
/____/\__/_/   \___/\__,_/_/ /_/ /_/_/_/ /_/\__, /
    ____  __    _      __                  /____/
   / __ \/ /_  (_)____/ /
  / /_/ / __ \/ / ___/ __ \
 / ____/ / / / (__  ) / / /
/_/   /_/ /_/_/____/_/ /_/         by Wes Connell
                                      @wesleyraptor


1. Deploy phishing classifier against certstream feed.
2. Operate phishing classifier in manual mode.
3. Manage classifiers (list active classifier and show available classifiers).
4. Train a new classifier.
5. Print configuration.
6. Exit.
  
Please make a selection [1-6]:

Running against CertStream

Select option 1 to run the active classifier against the Certificate Transparency log network. Users will be prompted with an error message if no classifiers are trained (in which case, select option 4 to train a classifier and try again).

Please make a selection [1-6]: 1
[*] Fetching active classifier name from config.
[*] Fetching classifier artifacts from database.
[+] Loaded feature extractor.
[+] Loaded 4_22_v1 classifier.
[*] Analysis started - press CTRL+C to quit at anytime.
[cPanel, Inc.] [HIGH] [SCORE:1.000] cpanel.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] mail.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] webdisk.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] webmail.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] www.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] cpanel.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] mail.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] webdisk.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] webmail.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] www.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [SUSPICIOUS] [SCORE:0.841] amazon-services-com.gq
[cPanel, Inc.] [SUSPICIOUS] [SCORE:0.841] www.amazon-services-com.gq

Running in Manual Mode

Select option 2 to run the classifier in manual mode, where users may manually type FQDNs on the command line to be scored by the active classifier.

Please make a selection [1-6]: 2
[*] Fetching active classifier name from config.
[*] Fetching classifier artifacts from database.
[+] Loaded feature extractor.
[+] Loaded 4_22_v1 classifier.
[+] Deploying in manual mode. Type 'exit' or 'quit' at any time to return to the main menu.
FQDN/Host/URL: chasebnk-com.ml
[PHISHING]: 0.976
FQDN/Host/URL: apppleid.support-forgot.reset-password.mweiewjsdfewt.com
[PHISHING]: 0.969
FQDN/Host/URL: apple.com
[NOT PHISHING]: 0.002
FQDN/Host/URL: paypal.org
[NOT PHISHING]: 0.002

Classifier Management

Select option 3 of the main menu to view a summary of performance metrics from all trained classifiers, change the active classifier, or delete a trained classifier. The classifier management menu looks like this:

Please make a selection [1-6]: 3
[+] Active classifier: better_training_data
[+] Other available classifiers:
        - wesley_v1
        - wesley_test_v2
        - who_dat
        - no_fqdn_keywords

1. Summarize accuracy metrics across all trained classifiers.
2. Show performance metrics from a single classifier.
3. Change the active classifier.
4. Delete a classifier.
5. Return to the main menu.

Accuracy Metrics - All Trained Classifiers

Select option 1 of the classifier management menu to see a summary of accuracy metrics for all trained classifiers. The purpose of training additional classifiers is to explore how changes to the independent variables affect classifier performance (i.e. adding new training data, expanding/reducing features, using different algorithms, using different algorithm parameters, etc). One of the perks from building the application with docker-compose is that the classifiers don't disappear even after you make code changes and rebuild the cli container, because the classifiers persist to the db container.

Please make a selection [1-5]: 1
[+] Summary of classifier accuracy metrics:

[--- Test Set Accuracy ---]
0.9964  wesley_v1
0.9948  no_fqdn_keywords
0.9944  better_training_data
0.9944  wesley_test_v2
0.9936  no_tlds_included

[--- AUC [50%] ---]
0.9964  wesley_v1
0.9948  no_fqdn_keywords
0.9944  better_training_data
0.9944  wesley_test_v2
0.9935  no_tlds_included

[--- Recall [50%] ---]
0.9952  wesley_v1
0.9936  no_fqdn_keywords
0.9928  better_training_data
0.9928  wesley_test_v2
0.9902  no_tlds_included

[--- Precision [50%] ---]
0.9976  wesley_v1
0.9968  better_training_data
0.9968  wesley_test_v2
0.9959  no_tlds_included
0.9952  no_fqdn_keywords

[--- Feature Vector Size ---]
467     wesley_v1
465     better_training_data
465     wesley_test_v2
465     no_fqdn_keywords
414     no_tlds_included

[--- Training Set Accuracy ---]
0.9992  better_training_data
0.9992  wesley_test_v2
0.9989  no_fqdn_keywords
0.9988  wesley_v1
0.9968  no_tlds_included

Accuracy Metrics - Single Classifier

Select option 2 to view the performance metrics for a single trained classifier:

Please make a selection [1-5]: 2
Please enter the name of the classifier you want accuracy metrics from: sample_classifier
[+] Accuracy metrics for classifier sample_classifier:
{
    "accuracy": {
        "true_positive_rate": "0.9893",
        "precision": "0.9874",
        "confusion_matrix": [
            [
                1006,
                13
            ],
            [
                13,
                1016
            ]
        ],
        "recall": "0.9883",
        "false_positive_rate": "0.0128",
        "test_set_accuracy": "0.9873",
        "auc_score": "0.9873",
        "training_set_accuracy": "0.9958"
    },
    "info": {
        "algorithm": "LogisticRegression",
        "training_samples": {
            "phishing": 4170,
            "not_phishing": 4019
        },
        "training_date": "2018-04-24 22:50:10.094880",
        "feature_vector_size": 348,
        "parameters": {
            "dual": false,
            "fit_intercept": true,
            "n_jobs": 1,
            "class_weight": null,
            "max_iter": 100,
            "random_state": null,
            "warm_start": false,
            "solver": "liblinear",
            "verbose": 0,
            "intercept_scaling": 1,
            "tol": 0.0001,
            "C": 10,
            "penalty": "l2",
            "multi_class": "ovr"
        }
    }
}

Changing the Active Classifier

Select option 3 to change the active classifier if more than one classifiers are available in the database:

Please make a selection [1-5]: 3
Please enter the name of the classifier you'd like to activate: newest_classifier
[+] Activated new classifier, newest_classifier, in configuration.
[+] Active classifier: newest_classifier

Deleting a Classifier

Select option 4 to delete a trainer classifier:

Please make a selection [1-5]: 4
Please enter the name of the classifier you'd like to delete: newest_classifier
[+] Deleted classifier newest_classifier.

Training a Classifier

The system doesn't include any trained classifiers by default, so select option 4 from the main menu to train one. The metrics from the trained classifier will be printed to the screen as soon as training is complete (and FYI if you're unfamiliar with what the metrics mean, take a look at the accompanying Jupyter notebook). Continue following the instructions to save the classifier, give it a name, and activate it:

Please make a selection [1-6]: 4
[*] Loading benign data.
[*] Loading malicious data.
[+] Completed loading training data.
[*] Computing features...
[+] Training complete.
[*] Computing classifier metrics...
[+] Classifier metrics available.
The metrics from the newly trained classifier are as follows:
{
    "info": {
        "feature_vector_size": 467,
        "training_samples": {
            "phishing": 5000,
            "not_phishing": 5000
        },
        "parameters": {
            "penalty": "l2",
            "solver": "liblinear",
            "C": 10,
            "multi_class": "ovr",
            "intercept_scaling": 1,
            "n_jobs": 1,
            "class_weight": null,
            "fit_intercept": true,
            "tol": 0.0001,
            "warm_start": false,
            "verbose": 0,
            "random_state": null,
            "dual": false,
            "max_iter": 100
        },
        "training_date": "2018-03-27 07:18:47.759669",
        "algorithm": "LogisticRegression"
    },
    "accuracy": {
        "precision": "0.9959",
        "training_set_accuracy": "0.9991",
        "auc_score": "0.9915",
        "recall": "0.9871",
        "test_set_accuracy": "0.9916",
        "false_positive_rate": "0.0040",
        "true_positive_rate": "0.9903"
    }
}
Would you like to keep the classifier? [y/N] y
Please enter a name (no spaces) for the classifier: wesley_test_v1
[+] Saved new classifier wesley_test_v1.
Would you like to activate the classifier? [y/N] y
[+] Activated new classifier, wesley_test_v1, in configuration.

Retraining a Classifier

Training a new classifier might be necessary for several reasons:

  • Exploring new features to extract from FQDNs.
  • Updating the keywords, brands, or TLDs in the training_data folder.
  • Updating the training sets - perhaps correcting false positives from running against the Certificate Transparency log network.

Keyword and Training Data Changes

The training_data folder is bind-mounted to the host, so updating any of the data in the training_data folder doesn't require rebuilding the cli container in order to do a retrain. Select option 4 in the main menu to train another classifier, give it a unique name, and activate it. The new classifier, along with any previously trained classifiers, are persisted to the database container.

Feature Extraction Changes

Making changes to the feature extraction code (i.e. anything in cli/streamingphish/streamingphish/features.py) will require rebuilding the cli container, then selecting option 4 in the main menu. The good news is that as aforementioned, previous classifiers are not lost because they get persisted to the db container. Trained classifiers will only be lost if the db container goes down.

By default, features are extracted for any method in cli/streamingphish/streamingphish/features.py that starts with _fe_. Removing, adding, or updating these methods will warrant a retrain. Each method returns a dictionary and is well-documented on what they do. The initial methods for extracting features are as follows:

    def _fe_extract_tld(self, sample)
    def _fe_brand_presence(self, sample)
    def _fe_keyword_match(self, sample)
    def _fe_keyword_match_fqdn_words(self, sample)
    def _fe_compute_domain_entropy(sample)
    def _fe_check_phishing_similarity_words(self, sample)
    def _fe_number_of_dashes(sample)
    def _fe_number_of_periods(sample)

Rebuilding the cli container after modifying cli/streamingphish/streamingphish/features.py can be done with the following command:

 sudo docker-compose up -d --build

The db and notebook containers should remain unchanged, whereas the cli container should be rebuilt.