FishEye International, a non-profit focused on countering illegal, unreported, and unregulated (IUU) fishing, has been given access to an international finance corporation’s database on fishing related companies. In the past, FishEye has determined that companies with anomalous structures are far more likely to be involved in IUU (or other fishy business). FishEye has transformed the database into a knowledge graph, including information about companies, owners, workers, and financial status. FishEye is aiming to use this graph to identify anomalies that could indicate if a company is involved in IUU.
Project Objective:
This study aims to use visual analytics to
measure similarity of business groups
provide evidence for and against the case that anomalous companies are involved in illegal fishing
This will be investigated through both visual and quantitative means: visualising the networks from various industries present in the network graph could help identify if anomalous network structures uncovered in Fishy Business are more prominent or present in certain industries. Network Similarity measures will also be used to quantify these similarities from the graph objects.
%%{
init: {
"theme": "base",
"themeVariables": {
"primaryColor": "#d8e8e6",
"primaryTextColor": "#325985",
"primaryBorderColor": "#325985",
"lineColor": "#325985",
"secondaryColor": "#cedded",
"tertiaryColor": "#fff"
}
}
}%%
flowchart LR
A{Overall\nNetwork} --> B{Industries}
B --> C(Visual Similarities)
B --> D(Similarity Measures) -.-> E[Jaccard Similarity]
D -.-> F[Orbit Distribution]
C -.->|Similar?|H[Network Structures] --- I[Overlapping\nOwnership]
H --- J[Connected\nClusters]
2 Data Preparation
2.1 R Packages
The following packages are used for this study:
tidyverse, a collection of packages for data analysis (particularly dplyr for data manipulation)
ggraph, igraph and visNetwork for generating network graph visuals
The following groups were derived from topic modeling of products and services offered by companies in the data:
Topic.
Industry
Description
1
Industrial
Manages equipment, machinery and other industrial materials
2
Food
Vegetables, meat, fruits and other groceries
3
Seafood-processing
Packaging, canning, manufacturing of marine or seafood products
4
Consumer-goods
Non-fishing related accessories, furniture, apparel
5
Transport-logistics
Companies specialising in logistics, freight, cargo services
6
Fishing
Companies directly related to fishing of salmon, tuna, etc
As there were some companies that had a high probability of being classified under more than a single industry, these were grouped together as a separate category, multi-industry companies.
# Measure degree centrality and save as a columnV(industrial_graph)$degree <-degree(industrial_graph, mode ="all")# Define custom colors for each groupgroup_colors <-c("Ultimate Beneficial Owner"="#D86171" , "Shareholder"="#163759", "Company Contact"="#F39C12" , "Multi-role Entity"="#48C9B0", "Industrial Company"="#2980B9")set.seed(1234)industrial_network <- industrial_graph %>%ggraph(layout ="nicely") +geom_edge_fan(alpha = .6,show.legend =FALSE ) +scale_edge_width(range =c(0.1,4) ) +geom_node_point(# Ensure that UBO and Multi-role entity nodes can be clearly seen by making node size largeraes(size =ifelse(group =="Ultimate Beneficial Owner"| group =="Multi-role Entity", 3, degree),color = group),alpha = .9 ) +# Remove the legend for "degree"guides(color =guide_legend(title ="Role:"),size ="none" ) +# Assign custom colors to each group to standardisescale_color_manual(values = group_colors) +# Only label companies in top 25th percentile of degree measuregeom_node_text(aes(label =ifelse(degree >quantile(degree, .75), id, "")), size =2,repel =TRUE ) +labs(title ="Industrial Company Network" ) +theme(plot.title =element_text(size =16,color ="grey20"),legend.title =element_text(),legend.position ="bottom",# legend.direction = "horizontal",legend.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),panel.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.margin =margin(r =10,l =10) )industrial_network
Code
# Measure degree centrality and save as a columnV(food_graph)$degree <-degree(food_graph, mode ="all")# Define custom colors for each groupgroup_colors <-c("Ultimate Beneficial Owner"="#D86171" , "Shareholder"="#163759", "Company Contact"="#F39C12" , "Multi-role Entity"="#48C9B0", "Food Company"="grey40")set.seed(1234)food_network <- food_graph %>%ggraph(layout ="nicely") +geom_edge_fan(alpha = .6,show.legend =FALSE ) +scale_edge_width(range =c(0.1,4) ) +geom_node_point(aes(size =ifelse(group =="Ultimate Beneficial Owner"| group =="Multi-role Entity", 3, degree),color = group),alpha = .9 ) +# Remove the legend for "degree"guides(color =guide_legend(title ="Role:"),size ="none" ) +# Assign custom colors to each group to standardisescale_color_manual(values = group_colors) +geom_node_text(aes(label =ifelse(degree >quantile(degree, .75), id, "")), size =2,repel =TRUE ) +labs(title ="Food Industry Network" ) +theme(plot.title =element_text(size =16,color ="grey20"),legend.title =element_text(),legend.position ="bottom",# legend.direction = "horizontal",legend.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),panel.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.margin =margin(r =10,l =10) )food_network
Code
# Measure degree centrality and save as a columnV(seafood_graph)$degree <-degree(seafood_graph, mode ="all")# Define custom colors for each groupgroup_colors <-c("Ultimate Beneficial Owner"="#D86171" , "Shareholder"="#163759", "Company Contact"="#F39C12" , "Multi-role Entity"="#48C9B0", "Seafood-processing Company"="#B78EA4")set.seed(1234)seafood_network <- seafood_graph %>%ggraph(layout ="nicely") +geom_edge_fan(alpha = .6,show.legend =FALSE ) +scale_edge_width(range =c(0.1,4) ) +geom_node_point(aes(size =ifelse(group =="Ultimate Beneficial Owner"| group =="Multi-role Entity", 3, degree),color = group),alpha = .9 ) +# Remove the legend for "degree"guides(color =guide_legend(title ="Role:"),size ="none" ) +# Assign custom colors to each group to standardisescale_color_manual(values = group_colors) +geom_node_text(aes(label =ifelse(degree >quantile(degree, .75), id, "")), size =2,repel =TRUE ) +labs(title ="Seafood-Processing Industry Network" ) +theme(plot.title =element_text(size =16,color ="grey20"),legend.title =element_text(),legend.position ="bottom",# legend.direction = "horizontal",legend.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),panel.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.margin =margin(r =10,l =10) )seafood_network
Code
# Measure degree centrality and save as a columnV(goods_graph)$degree <-degree(goods_graph, mode ="all")# Define custom colors for each groupgroup_colors <-c("Ultimate Beneficial Owner"="#D86171" , "Shareholder"="#163759", "Company Contact"="#F39C12" , "Multi-role Entity"="#48C9B0", "Consumer-goods Company"="#138D75")set.seed(1234)goods_network <- goods_graph %>%ggraph(layout ="nicely") +geom_edge_fan(alpha = .6,show.legend =FALSE ) +scale_edge_width(range =c(0.1,4) ) +geom_node_point(aes(size =ifelse(group =="Ultimate Beneficial Owner"| group =="Multi-role Entity", 3, degree),color = group),alpha = .9 ) +# Remove the legend for "degree"guides(color =guide_legend(title ="Role:"),size ="none" ) +# Assign custom colors to each group to standardisescale_color_manual(values = group_colors) +geom_node_text(aes(label =ifelse(degree >quantile(degree, .75), id, "")), size =2,repel =TRUE ) +labs(title ="Consumer Goods Industry Network" ) +theme(plot.title =element_text(size =16,color ="grey20"),legend.title =element_text(),legend.position ="bottom",# legend.direction = "horizontal",legend.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),panel.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.margin =margin(r =10,l =10) )goods_network
Code
# Measure degree centrality and save as a columnV(transport_graph)$degree <-degree(transport_graph, mode ="all")# Define custom colors for each groupgroup_colors <-c("Ultimate Beneficial Owner"="#D86171" , "Shareholder"="#163759", "Company Contact"="#F39C12" , "Multi-role Entity"="#48C9B0", "Transport-logistics Company"="#85929E")set.seed(1234)transport_network <- transport_graph %>%ggraph(layout ="nicely") +geom_edge_fan(alpha = .6,show.legend =FALSE ) +scale_edge_width(range =c(0.1,4) ) +geom_node_point(aes(size =ifelse(group =="Ultimate Beneficial Owner"| group =="Multi-role Entity", 3, degree),color = group),alpha = .9 ) +# Remove the legend for "degree"guides(color =guide_legend(title ="Role:"),size ="none" ) +# Assign custom colors to each group to standardisescale_color_manual(values = group_colors) +geom_node_text(aes(label =ifelse(degree >quantile(degree, .75), id, "")), size =2,repel =TRUE ) +labs(title ="Transport and Logistics Industry Network" ) +theme(plot.title =element_text(size =16,color ="grey20"),legend.title =element_text(),legend.position ="bottom",# legend.direction = "horizontal",legend.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),panel.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.margin =margin(r =10,l =10) )transport_network
Code
# Measure degree centrality and save as a columnV(fishing_graph)$degree <-degree(fishing_graph, mode ="all")# Define custom colors for each groupgroup_colors <-c("Ultimate Beneficial Owner"="#D86171" , "Shareholder"="#163759", "Company Contact"="#F39C12" , "Multi-role Entity"="#48C9B0", "Fishing-related Company"="#629DBB")set.seed(1234)fishing_network <- fishing_graph %>%ggraph(layout ="nicely") +geom_edge_fan(alpha = .6,show.legend =FALSE ) +scale_edge_width(range =c(0.1,4) ) +geom_node_point(aes(size =ifelse(group =="Ultimate Beneficial Owner"| group =="Multi-role Entity", 3, degree),color = group),alpha = .9 ) +# Remove the legend for "degree"guides(color =guide_legend(title ="Role:"),size ="none" ) +# Assign custom colors to each group to standardisescale_color_manual(values = group_colors) +geom_node_text(aes(label =ifelse(degree >quantile(degree, .75), id, "")), size =2,repel =TRUE ) +labs(title ="Fishing-related Industry Network" ) +theme(plot.title =element_text(size =16,color ="grey20"),legend.title =element_text(),legend.position ="bottom",# legend.direction = "horizontal",legend.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),panel.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.margin =margin(r =10,l =10) )fishing_network
Code
# Measure degree centrality and save as a columnV(multi_graph)$degree <-degree(multi_graph, mode ="all")# Define custom colors for each groupgroup_colors <-c("Ultimate Beneficial Owner"="#D86171" , "Shareholder"="#163759", "Company Contact"="#F39C12" , "Multi-role Entity"="#48C9B0", "Multi-Industry Company"="#766F87")set.seed(1234)multi_network <- multi_graph %>%ggraph(layout ="nicely") +geom_edge_fan(alpha = .6,show.legend =FALSE ) +scale_edge_width(range =c(0.1,4) ) +geom_node_point(aes(size =ifelse(group =="Ultimate Beneficial Owner"| group =="Multi-role Entity", 3, degree),color = group),alpha = .9 ) +# Remove the legend for "degree"guides(color =guide_legend(title ="Role:"),size ="none" ) +# Assign custom colors to each group to standardisescale_color_manual(values = group_colors) +geom_node_text(aes(label =ifelse(degree >quantile(degree, .75), id, "")), size =2,repel =TRUE ) +labs(title ="Multi-industry Company Network" ) +theme(plot.title =element_text(size =16,color ="grey20"),legend.title =element_text(),legend.position ="bottom",# legend.direction = "horizontal",legend.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),panel.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.background =element_rect(fill="#dfdfeb",colour="#dfdfeb"),plot.margin =margin(r =10,l =10) )multi_network
Initial Insights from Visualisations:
The networks of Multi-industry & Consumer Goods feature Ultimate Beneficial Owners, as well as multiple companies with large numbers of shareholders within the network structure
There are many single-linked connections within the network structures, which suggests that the missing ‘links’ are filled with connections to companies in other industries.
4 Quantifying Network Similarity
4.1 Jaccard Similarity Index
The Jaccard similarity index compares the presence or absence of edges (links) between pairs of nodes in the networks, calculating the ratio of the number of common edges to the total number of edges in both networks. A higher Jaccard Similarity score thus suggests a greater overlap in edge connections, which makes the networks relatively more similar in terms of overall connectivity.
The measure is calculated for each pair of Industry-classified networks, with the data stored as a matrix. This is then visualised through a heatmap:
# Store all graph objects as a listall_industries <-list(industrial_graph, food_graph, seafood_graph, goods_graph, transport_graph, fishing_graph, multi_graph)# Create function to calculate jaccard similarity between two network graphsjaccard_sim <-function(graph1, graph2) { common_edges <-intersect(E(graph1), E(graph2)) jaccard_similarity <-length(common_edges) / (length(E(graph1)) +length(E(graph2)) -length(common_edges))return(jaccard_similarity)}
# Loop through all network graphs and calculate jaccard similarityjaccard_similarity <-matrix(0, nrow =length(all_industries), ncol =length(all_industries))for (i in1:length(all_industries)) {for (j in1:length(all_industries)) { jaccard_similarity[i, j] <-jaccard_sim(all_industries[[i]], all_industries[[j]]) }}
# Create dataframe with jaccard similarity matrixjaccard_similarity_df <-as.data.frame(jaccard_similarity)# Assign row and column names according to industrynetwork_titles <-c("Industrial", "Food", "Seafood", "Goods", "Transport", "Fishing", "Multi-industry")rownames(jaccard_similarity_df) <- network_titlescolnames(jaccard_similarity_df) <- network_titleskbl(jaccard_similarity_df,caption ="Jaccard Similarity Index Comparing Network Graphs") %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed", "responsive"))
Jaccard Similarity Index Comparing Network Graphs
Industrial
Food
Seafood
Goods
Transport
Fishing
Multi-industry
Industrial
1.0000000
0.6138614
0.9462366
0.7294118
0.8017241
0.9731183
0.7209302
Food
0.6138614
1.0000000
0.5808581
0.8415842
0.7656766
0.5973597
0.8514851
Seafood
0.9462366
0.5808581
1.0000000
0.6901961
0.7586207
0.9723757
0.6821705
Goods
0.7294118
0.8415842
0.6901961
1.0000000
0.9098039
0.7098039
0.9883721
Transport
0.8017241
0.7656766
0.7586207
0.9098039
1.0000000
0.7801724
0.8992248
Fishing
0.9731183
0.5973597
0.9723757
0.7098039
0.7801724
1.0000000
0.7015504
Multi-industry
0.7209302
0.8514851
0.6821705
0.9883721
0.8992248
0.7015504
1.0000000
Code
# Visualize the Jaccard similarity matrix as a heatmappacman::p_load(heatmaply)heatmaply(as.matrix(jaccard_similarity_df),colors = Blues,seriate ="OLO",main="Jaccard Similarity Scores Comparing Industry Networks",margins =c(NA,200,60,NA) ) %>% plotly::layout(plot_bgcolor ="#dfdfeb",paper_bgcolor ="#dfdfeb")
4.2 Orbit Distribution Agreement (ODA)
This measure provides a way to compare the local structural patterns between networks, focusing on the distribution of orbit types. It captures the similarity or dissimilarity in the occurrence of specific subgraph patterns within the network, which can provide insights into their functional properties, evolution, or dynamics.
# Store all graph objects as a listall_industries <-list(industrial_graph, food_graph, seafood_graph, goods_graph, transport_graph, fishing_graph, multi_graph)# Create function to calculate oda between two network graphsnet_oda <-function(graph1, graph2) { calc <-netODA(graph1, graph2)return(calc)}
# Loop through all network graphs and calculate net ODAnetOda_matrix <-matrix(0, nrow =length(all_industries), ncol =length(all_industries))for (i in1:length(all_industries)) {for (j in1:length(all_industries)) { netOda_matrix[i, j] <-net_oda(all_industries[[i]], all_industries[[j]]) }}
# Create dataframe with jaccard similarity matrixnetOda_df <-as.data.frame(netOda_matrix)# Assign row and column names according to industryrownames(netOda_df) <- network_titlescolnames(netOda_df) <- network_titleskbl(netOda_df,caption ="Orbit Distribution Agreement of Network Graphs") %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed", "responsive"))
Orbit Distribution Agreement of Network Graphs
Industrial
Food
Seafood
Goods
Transport
Fishing
Multi-industry
Industrial
1.0000000
0.9061240
0.9733774
0.8688935
0.9069822
0.9886228
0.8644830
Food
0.9061240
1.0000000
0.9015054
0.8491589
0.8940284
0.9127471
0.8574863
Seafood
0.9733774
0.9015054
1.0000000
0.8503657
0.8996167
0.9820156
0.8691342
Goods
0.8688935
0.8491589
0.8503657
1.0000000
0.8497719
0.8661265
0.8064378
Transport
0.9069822
0.8940284
0.8996167
0.8497719
1.0000000
0.9126474
0.8539392
Fishing
0.9886228
0.9127471
0.9820156
0.8661265
0.9126474
1.0000000
0.8730766
Multi-industry
0.8644830
0.8574863
0.8691342
0.8064378
0.8539392
0.8730766
1.0000000
Code
# Visualize the net ODA matrix as a heatmapheatmaply(as.matrix(netOda_df),colors = Greens,seriate ="OLO",main="Orbit Distribution Agreement Across Industry Networks",margins =c(NA,200,60,NA) ) %>% plotly::layout(plot_bgcolor ="#dfdfeb",paper_bgcolor ="#dfdfeb")
Similarities in Network Structures Across Industries:
Similarities in Network structure can be derived from both Jaccard Similarity (JS) and Orbit Distribution Agreement (ODA) scores presented in the heatmap. The lighter the tile, the less similar these ‘groups’ are.
Highly similar industries:
Fishing & Seafood-processing Industries have the very high JS and ODA index of nearly 1.0, suggestive of almost identical network structures.
Seafood & Industrial and Fishing & Industrial structures also scored nearly 1.0 for both similarity measures. This suggests that these industries may be overlapping, highly related or interconnected with each other.
5 A Fishy Network?
Many of the ‘groups’ of companies by industry had high jaccard similarity and Orbit Distribution scores, which suggests that there is a significant overlap in the connections and relationships between companies, owners, and contacts involved in the overall network between industries. These network diagrams are visualised to compare the graphlet structures within the expanded networks, to determine if there are fishy ties between entities across industries.
In particular, networks with high Jaccard similarity could indicate:
Collaboration and Cooperation between industries
Cross-Industry Syndicates that span across multiple industries, coordinating illegal fishing activities and exploiting resources collectively.
Common entities who benefit from illegal fishing practices
A high Orbit Distribution Agreement could also suggest:
Common Network Dynamics or organizational structures. This could involve shared roles, relationships, or coordination mechanisms among key actors involved in illegal fishing.
Coordinated Activities between industries
As such, the following subgroups of networks are visualised for further comparison:
5.1 Industrial, Fishing-related and Seafood-processing Companies
From the analysis thus far, we can surmise that anomalous company structures may be derived from inter-industry networks, understanding any underlying links between companies as well as looking for overlapping ownership structures. Separate graphlets (clusters) within the network that are linked by entities could suggest possible joint ventures – this is most apparent in the Industrial, Fishing and Seafood-processing network, where several pairs of companies (of different industries and sub-clusters) are linked by one or more entities. These companies are:
SeaSolutions Leska Kga Worldwide (Seafood-processing) & Karelians Plc Marine conservation (Industrial), linked by John Williams (Multi-role Entity)
Aqua Aura SE Marine life (Fishing-related) & Irish Mackeral S.A. de C.V. Marine Biology (Seafood-processing), linked by Dale Rhodes & Cynthia Murphy (Shareholders) and Leah Criz & Jilian White (Company contacts)
Seashell Sanctum Co Family (Industrial) & Khundu N V and Son’s (Seafood-processing) linked by Amy Martinez (Shareholder)
Adriatic Tuna Seabass BV Transit (Fishing-related) & Fisher Marine Conservation CJSC Transportation (Industrial), linked by Mathew Mora (Shareholder)
Of particular interest is the fact that in almost all of these flagged pairs of linked companies, one of the companies have a company name that suggests involvement in the study of marine biology or conservation efforts: Karelians Plc Marine conservation, Irish Mackeral S.A. de C.V. Marine Biology, etc. This is particularly fishy, and reminiscent of the case study on Japanese Whaling industry, where Japanese whalers hunt whales under the guise of ‘scientific research’, exploiting a loophole in International Whaling Commission (IWC) rules, which allows whaling for conservation research purposes. Similarly, these fishy structures could be hiding IUU activity under the guise of reasearch or conservation efforts.