EXPERT
FOCUS
Monitor Impact Of Climate Warming In The Arctic
ORGANIZATION
University of Texas at Austin – Institute for Geophysics
Learn More About the Categories
There are many areas where you can help. Read the descriptions, watch the presentations by our domain experts, and select one that appeals to you the most.
Graph for a Better Earth
Improve living conditions on Earth by focusing on sustainability
EXPERT
FOCUS
Monitor Impact Of Climate Warming In The Arctic
ORGANIZATION
University of Texas at Austin – Institute for Geophysics
Graph for Better Finance
Help people or organizations prosper
Graph for Better Health
Find ways to enable people to live healthier lives
EXPERT
FOCUS
Detect Early COVID-19 Mutated Variants
ORGANIZATION
Worcester Polytechnic Institute
Graph for Better Learning
Discover innovations that allow people to learn quicker and more impactfully
Graph for Better Living
Make life more enjoyable for humanity
Graph for Better Systems
Increase efficiencies across global processes
EXPERT
FOCUS
Develop Effective Public Transportation Systems
ORGANIZATION
MUST Research
Graph for a Better World
Tackle social issues to make the world a better place to live
EXPERT
FOCUS
Enable Search For United Nations Sustainable Development Goals
ORGANIZATION
Common Action
EXPERT
FOCUS
McKenzie Steenson
Create STEM Opportunities for Women
ORGANIZATION
TigerGraph
EXPERT
FOCUS
Daniel Barkus
Find Ethically-Sourced Goods
ORGANIZATION
TigerGraph
Graph for Better X
Identify a problem you want to tackle with graph analytics and invite others to follow
EXPERT
FOCUS
You
Design Your Problem Statement
Alexey Portnov, PhD
Alexey Portnov is a marine geoscientist at the University of Texas at Austin. He uses geophysical methods to address global geological and environmental questions related to carbon cycle, past and present climate, geohazards, evolution of cryosphere, and other. Before UT Austin, Alexey worked as a postdoctoral scholar at the Ohio State University in the USA and The Arctic University of Norway, where he acquired his PhD in science. Before moving to Norway, Alexey studied and worked as a geophysicist in Saint-Petersburg, Russia, which is his birthplace. Alexey is broadly interested in scientific programming and science communication, which he delivers though his website and YouTube channel:
Graph for Better Earth: Monitor Impact Of Climate Warming In The Arctic
Author: Alexey Portnov, PhD Research Associate at University of Texas at Austin – Institute for Geophysics
PROBLEM STATEMENT
Anthropogenic climate change has accelerated over the past several decades, and affects people’s well being through increasing catastrophic weather events, rising sea levels, and drought. Particularly vulnerable are the polar regions where increasing greenhouse gas concentrations result in mean annual temperature increase that is 2-3 times higher compared to the rest of the world. This effect is called “Arctic amplification” and it directly harms nature and wildlife through sea ice melting and permafrost thawing. It is also a significant risk factor for the indigenous communities and engineering infrastructure of the Polar regions.
One of the critical consequences of the global temperature rise is thawing Arctic permafrost. Permafrost regions are extensive (thousands of kilometres wide), and they conserve tremendous amounts of organic carbon, which releases into the atmosphere in the form of greenhouse gas methane if permafrost disintegrates. Such a process has recently intensified, and we observe it, for example, through the appearance of explosive gas blow-out craters in the Canadian, US and Russian Arctic. It is a rapid and hazardous process. For example, a flat (or slightly doming) earth surface turns into a deep and wide crater in a matter of hours and over the next year will fill with water and become a thermokarst lake.
THE CHALLENGE
Monitoring the emergence of thermokarst lakes is important for understanding the impact of climate warming and assessing the geo-hazard risks in the polar regions. Such craters and lakes (that can be tens to thousands of metres wide) are well seen on the satellite images. The challenge is to use available public satellite databases over the last 20-30 years (depending on the image quality and availability) to capture the differences in images before and after lakes appear and produce the time-series for newly generated lakes. The thermokarst lake locations and time of origin should be marked/catalogued. This will allow monitoring their dynamics and further feed it into various climate models. It makes sense to use mostly summer months. Potential geographic regions are Yamal peninsula, Tuktoyaktuk peninsula or any other Arctic regions with available satellite data.
RESOURCES
Helpful reading and images on the announced topic:
Potential satellite databases:
Haris Dindo
Chief Technology Officer at SHS Asset Management
Haris has worked on machine learning and knowledge engineering for over 15 years, both in academia and industry. He has advanced knowledge extraction, representation and manipulation – from both structured and unstructured data – in the fields of natural language processing, robotics, image processing and artificial intelligence.
Haris left academia and joined Yewno Inc in 2016 as their Chief Data Scientist responsible for creating one of the first industrial-grade dynamically changing Knowledge Graphs induced from textual data solely. For his work he has been awarded a US patent, entitled “Structuring Data in a Knowledge Graph”. Haris joined SHS Asset Management as Chief Technology Officer in February 2021 with the focus of building the first Knowledge Arbitrage-based investment fund. Haris has a Ph.D in Machine Learning and Robotics from Universita degli Studi di Palermo.
Graph for Better Finance: Predict Global Crises
Domain Expert: Haris Dindo, Chief Technology Officer at SHS Asset Management
PROBLEM STATEMENT
Throughout history, we have seen how one country’s decisions and behaviors can affect others, if not the world. The most recent example is the crisis of 2007/08, when the US housing market collapsed and triggered the whole world to go into a crisis. Another – more recent – example is that of the insurgence of COVID-19 pandemic. Besides these, there are myriad other disruptive crises among world countries. Since history tends to repeat itself, one cannot help but wonder if one could minimize the effects of another country’s crisis on their own population.
THE CHALLENGE
Given that all countries are connected to each other, construct a graph to see how much one country’s crisis will affect the other. Treat countries as nodes and their relationships as links. Leverage the different socio-economic and macroeconomic aspects that were captured for each country throughout time in order to predict a crisis, or, to predict what other countries will be affected by one country’s crisis. This would help countries minimize the effects of another country’s crisis on their own citizens.
RESOURCES
Following is a list of data resources that can be used to address the challenge. But there is definitely much more: from specific macroeconomic indicators to perceived sentiment around countries and their economies. Be creative!
Dave DeCaprio
Dave has over 20 years of experience transitioning advanced technology from academic research labs into successful businesses. His experience includes genome research, pharmaceutical development, health insurance, computer vision, sports analytics, speech recognition, transportation logistics, operations research, real time collaboration, robotics, and financial markets.
Dave is a cofounder and the Chief Technology Officer at ClosedLoop.ai. He founded ClosedLoop in 2017 with Andrew Eye to build a healthcare specific data science and machine learning platform. ClosedLoop was selected the winner in the AI Health Outcomes Challenge, a $1.6 million X-prize style competition sponsored by the Center for Medicare and Medicaid Services, and was selected as a Top Performer in Healthcare-focused AI in 2020.
Prior to founding ClosedLoop, Dave had been involved in several successful startups as well as consulting and advising both small and large organizations on how to innovate using technology with maximum impact. As part of the Human Genome Project, he was responsible for mapping all known biology onto the newly assembled genome sequences. At Bluefin Labs, acquired by Twitter in 2014, he helped the founders recruit a team and go from concept to a launch with the NFL in 13 weeks. At GNS Healthcare, he led the development of their first in class causal machine learning platform. Dave graduated from MIT with a degree in Electrical Engineering and Computer Science and currently lives in Austin, TX.
Author: Dave DeCaprio, Founder and CTO of ClosedLoop AI
PROBLEM STATEMENT
COVID response efforts focus on using the wide array of publicly available data on COVID transmission and spread to help understand virus transmission and spread, with the hope of being able to predict and mitigate future infection spikes.
COVID infections have occured in several waves of rapidly increasing and then declining infections, with hospitalizations and deaths lagging infections by predictable lags. Many explanations have been proposed over the last two years to explain these waves, including new variants, weather, vaccinations, public health measures, and changes in individual decision making and risk tolerance. All appear to play a factor, but the rise and fall of cases over time isn’t fully explained by any of these alone nor do we have good insights into when the next wave will come, how long it will last, or how severe it will be. Several insights have come from analyzing the progression of these waves within different countries and within different regions of a country (counties or zip codes within the US for example).
THE CHALLENGE
Develop a solution to analyze prior COVID waves and model the progression of new infections. This approach could inform policy makers to more proactively institute restrictions ahead of impending waves, and to remove restrictions that are unrelated to the caseload increases or have simply outlived their usefulness. The solution could also be used by healthcare organizations to plan for capacity surges by rescheduling elective procedures, and by any organization planning a large gathering to have more insight into COVID-related adjustments to their plans.
RESOURCES
Author: Dave DeCaprio, Founder and CTO of ClosedLoop AI
PROBLEM STATEMENT
Novel treatments focus on looking at healthcare data to identify and learn from the “natural experiments” going on within the healthcare system as doctors and patients seek to find novel ways to address their illnesses.
Drugs and other medical treatments are approved in carefully controlled clinical trials that are designed to answer very narrow questions about the safety and efficacy of those treatments for very specific conditions and patient groups. However, treatments are often effective for a wider range of uses than what they are officially approved for. Such “off label” usage is common. These treatments could be effective for patients, but often aren’t as well studied once the drugs are out in the market.
THE CHALLENGE
Developing a solution to combine various healthcare data sources to understand patterns of diagnosis and treatment that point to potential off label usage of drugs. An example is clustering doctors with the drugs they typically prescribe and the diseases they typically treat, and then comparing those results to known databases of the diseases associated with particular drugs. Outliers in this space provide potential cues to off label usage. This could be used to identify interactions missing from those databases that could be useful to doctors, patients, and researchers.
RESOURCES
Dataset Example Resources:
Chun-Kit Ngan
Professor of Data Science at Worcester Polytechnic Institute at Worcester Polytech Institute
Dr. Chun-Kit Ngan is an Assistant Teaching Professor of Data Science at Worcester Polytechnic Institute (WPI). Before joining WPI, he was an Assistant Professor at Penn State University-Great Valley (PSU-GV). His research interests are Decision Guidance and Support Systems (DGSS) to guide domain-specific decision makers to make better decisions and provide them with actionable recommendations. He has published over 30 articles in various venues. He received the Best Paper and the Best Student Paper Awards at the 2013 and 2011 International Conference on Enterprise Information Systems. He was the recipient of the 2015-2016 Early Career Award for Research and Scholarship Excellence at PSU-GV. He is the Co-PI of the funded 2019-2022 NSF REU Site Data Science for Healthy Communities in the Digital Age at WPI and the PI of the Clinical Decision Support System project funded by Diameter Health in 2022. Other related-DGSS projects have been technically supported by Vodafone GmbH and John Snow Labs.
Author: Dr. Chun-Kit Ngan, Professor of Data Science at Worcester Polytechnic Institute
MOTIVATIONAL BACKGROUND
Severe and various COVID-19 mutated variants have been evolving over time that have threatened the lives of the public community.
THE CHALLENGES
Coronaviruses are a family of single-stranded RNA viruses that can transmit infections between humans and have been documented for over 50 years. Coronavirus infectious disease 2019 (COVID-19) was initially reported as the Wuhan Coronavirus or as the 2019 novel coronavirus since December, 2019. Since then, COVID-19 cases continue to increase with several different mutated variants, e.g., Theta (lineage P.3), Alpha (lineage B.1.1.7 with E484K), Delta (lineage B.1.617.2), Omicron (lineage B.1.1.529), to name a few. People with COVID-19 and its mutated variants may have a wide range of symptoms, e.g., trouble breathing, persistent pain in the chest, and inability to stay awake. These complications can range from mild symptoms to severe illness that can ultimately result in death. Thus, developing an effective approach to detect the emergence of these mutated variants in advance is needed so that medical therapists and researchers can intervene early, take the appropriate actions promptly regarding treatments and vaccines to mitigate adverse effects on the public.
OBJECTIVE
The purpose of this project is to develop and advance a graph-based multivariate time series anomaly detection, based upon the state-of-the-art graph-neural-network architectures, to detect the emergence of COVID-19 mutated variants earlier than currently possible.
RESOURCES
● Dataset Resource: https://github.com/echen102/COVID-19-TweetIDs
● Technology Resources: https://docs.tigergraph.com/home/
María Laura García
President and Founder of GlobalNews® Group at World Innovation and Change Management Institute
Entrepreneur and media expert, particularly interested in the challenges and transformations facing the digital world, the media and the information industry, and how the media intelligence business ecosystem must adapt to evolve and allow for better decision-making.
Founder and President of GlobalNews®️ Group, the premier source of media monitoring in Latin-America with presence in 10 countries. María Laura is an investor in multiple regional startups. She also serves as Chairwoman of the Business Committee of the World Innovation and Change Management Institute (WICMI), based in Geneva, and as an advisory board member of the Inicia entrepreneurship community. Strongly committed to women development in business and economic activities, María Laura is a mentor and Vice President of Vital Voices in the South Cone. Previously, she served as the President of the Federation Internationale Des Bureaux D’Extraits De Presse (FIBEP).
Author: María Laura García, President and Founder of GlobalNews® Group
PROBLEM STATEMENT
Human thought is prone to errors. In recent times, we have gotten better at understanding why those errors occur and how to utilize and exploit them. Modern advances in technology like Artificial Intelligence, big data, cloud computing, and blockchain can all be used to manipulate our cognitive biases in order to sell us products and services or make us value certain information more than others. One of the most notable of these cases is the utilization of confirmation bias by social media companies to create “filter bubbles” of opinions that a person agrees with. This not only prevents us from being well-informed, but ultimately leads to a lack of critical thinking and adds to polarization. If we only listen to and validate one way of thinking, and do not expand and question it, sooner or later we will become intolerant citizens. And when there is no more tolerance, the very foundations of our democratic coexistence begin to crack.
THE CHALLENGE
In a context in which humanity is facing major challenges and democracies are being questioned throughout the world, emerging technologies can represent either a threat or an opportunity. In this case, the key is to identify a way to use these technologies to break through the confirmation bias and the filter bubbles enhanced by social media platforms. The aim is to foster critical thinking in digitally literate citizens who access and critically relate to different types of information, and who increasingly engage in dialogue with other points of view and with those who think differently. The goal is not to come up with an alternative business model for digital platforms or to eliminate the confirmation bias of the human mind, which would be almost impossible. Rather, it is to empower citizen-users with the necessary tools to be able to identify the presence of these biases and to consciously seek diverse perspectives and opinions on the same topic.
In order to understand how confirmation bias is affecting a given user on social media, a possible approach would be to have a model that analyzes clusters that publish and repost similar articles on social media (especially using public social media for example from Twitter: Twitter API Documentation | Docs | Twitter Developer Platform) and then, given the user’s account, find which media sources are more prone to be on his community (up to 2nd level connections). Then, analyzing the similarity or using topic analysis of articles published on a news api (for example https://newscatcherapi.com/ or https://newsapi.org/ ), suggest articles that have a similarity with those the user is seeing on his feed (so as to maintain interest) from other sources that are not usually present on his community.
RESOURCES
Books and articles
Possible data sources
Research institutes
Ashleigh Faith, PhD
Director, Knowledge Graph and Semantic Search at EBSCO
Ashleigh Faith is the Director of Knowledge Graph and Semantic Search, as well as a teacher of graph and data science on YouTube. She has her PhD focused on Advanced Semantics. She has worked on semantic search and knowledge graph machine learning and data modeling for over 15 years with notable corporate and government entities such as GM, NASA, US Navy, Amazon, Microsoft, NATO, Gulfstream, The Fed, and NLM. Her main focus is knowledge graph, semantic search, and graph architecture.
Graph for Better Learning: Reduce The Noise Of News Search
Author: Ashleigh Faith, Director, Knowledge Graph and Semantic Search at EBSCO
PROBLEM STATEMENT
Google, while a wonderful resource for quick-fix questions, starts to repeat search results after the third page, this is especially true for news articles. A big reason for this is duplicate resources from common sister agencies like Associate Press and all the newspapers that use its articles, as well as reshares or reposts, artificially inflate the volume of an article/post and its importance. Re-posts or re-shares often are also changed slightly so Google does not see them as duplicates. This causes inflated importance of some posts (going “viral” unnecessarily) and gives a noisy Google search experience that may be hiding more relevant news articles from end-users.
THE CHALLENGE
How can news articles with the same content be identified and associated with each other in order to prevent inflation of information importance? Take cues from copyright detection or song recognition as you design your solution. Attempt to identify duplicate news articles that you might scrape from Google or internet search results and what sources those articles commonly come from. How can this information be used to better enable the public to make sure they’re getting the most important and diverse information?
POSSIBLE APPROACH
The desired state would likely be to have a hyper-node graph (https://aclanthology.org/2020.emnlp-main.596.pdf) that represents the common metadata for a cluster of duplicate or near-duplicate articles/posts and how their metadata relates to one another (with similarity score so some data science would be needed here). The individual articles and their metadata would be clustered together as relations to the hyper-node. Each hyper-node would in effect represent all versions of the individual articles and posts that are duplicates and give a normalized representation of the article/post. This hyper-node and its metadata can then be used to group articles/posts together in a search application to minimize noisy search for news articles/posts and help end-users identify if an article/post is actually “going viral” or just overhyped and not worth their time.
To scope this solution, take 30-50 news articles and posts (even distribution if possible) and create a hypergraph of as many duplicate or near duplicate articles you can find (you should use the metadata to determine similarity of a duplicate). Document the metadata for each article/post in your dataset, assess the metadata for duplicate information to create a similarity score, the most similar articles/posts will create the cluster of articles/posts related to each hyper-node. You decide the threshold for similarity but 75% (0.75 f-score) similarity on metadata fields is the lowest recommendation that is likely to produce good results. Make sure to document the normalized information (the data the clustered articles/posts have in common) as metadata for the hyper-node and the similarity between each hyper node. Representing the similarity of metadata between hyper-nodes will allow for the solution to scale so that as new news articles/posts are posted, the metadata can be queried to identify if the new article/post is an existing duplicate, or a new article/post.
The desired state would have two outputs: the first is the model and its populated hyper-graph and the second is the similarity model, likely a machine learning model. The hyper-graph can be scoped to have 30-50 hyper-nodes, with at least 2 duplicate or near duplicate articles associated with each hyper-node (so a total dataset of 60-100 individual articles/posts and their metadata). Each hyper-node will have the normalized metadata of the articles it represents, as well as the similarity score for the individual articles to one another and the similarity score between each hyper-node. The machine learning model should be open-source on Github and be flexible enough to be pointed at any news dataset that has standard metadata such as Google News or social media news feeds like Twitter, and allow for the similarity threshold to be modified.
This solution should be able to be used to 1.) identify duplicates in a static news dataset in the graph, 2.) identify if a new article is a duplicate of an existing article in the graph, 3.) enable others to use the similarity model on news datasets, and 4.) allow for a search engine to traverse the graph and retrieve the hyper-node (and the articles/posts it relates to) for retrieval and display, similar to how Google Scholar represents similar academic articles.
RESOURCES
Dataset Example Resources:
Graph for Better Living: Manage Your Personal Identity
Author: Ashleigh Faith., Director, Knowledge Graph and Semantic Search at EBSCO
PROBLEM STATEMENT
More data is generated per day by one person than a typical space mission generates. With our data being used in ways we are not expecting or is potentially unethical, more citizens want to learn how to better track and manage their data. The concept of personal knowledge graphs have risen but these are still focused on the information someone wants to keep, not their personal data and how companies are using it. With fraud and other malicious activity running rampant, having a simple way to see the network of where your information has been shared, what information was shared, and when, will help citizen scientists track their data to ask to be forgotten, track risks to their information, and also help identify where a breach may have occurred.
THE CHALLENGE
We can do better. By generating a model that an individual can use without having graph experience can allow for individuals to track their own information and make better decisions on who has their data, and what is being done with it. This model would include the most common data generated by an individual such as nodes for personal information like birth date, unique identifier like social security number, health records, bank information, etc., how each piece of information is used by business, services, and institutions, and when the information was shared or updated, all for a specific person. The specific information will not be entered for security reasons, but the user would know where their birth date was shared, who it was shared with, and when, all in an easy to use no-code data entry with a graphical visual to help users track their data on their own.
Imagine the scenario where your mom needs to track down all the places her bank account and routing information is stored, perhaps she knows now that checks are not all that secure and wants to protect herself. This solution should enable her to not only enter where this information is stored across her network of information, but also should allow her to find the specific institutions or types of institutions where her specific account and routing number are currently. Because of GDPR, your mom can now ask these companies to forget the sensitive information.
Take for example another scenario where you want to track your spending habits to identify areas to save money. This solution would allow you to enter where and when you shop and what you buy there, and general price point as a stretch. This would enable you to identify trends in your purchase habits and perhaps decide coffee is a better caffeine solution than expensive energy drinks.
In both of these scenarios, the basic nodes in someone’s day-to-day life like companies, phone numbers, emails, shared sync accounts, people, and more (with the option for each node to have specific metadata such as label type, label name like Bank of America or CVS, value such as $10), and a set list of relations like purchased on, added on, added by, started service, and a general relation so the list is not too prescriptive (remember, this is for laypeople who usually don’t understand relations between nodes).
RESOURCES
Dataset Example Resources:
Usha Rengaraju
Principal Data Scientist at MUST Research
Usha Rengaraju is a principal data scientist and polymath with a global reputation currently working at MUST Research and Infinite-Sum Modelling. Specializing in Probabilistic Graphical Models, Machine Learning and Deep Learning, Usha is a strong advocate for Neurodiversity and organized India’s first ever conference on Neurodiversity – Neurodiversity India Summit in 2020. She also organised NeuroAI which is India’s first-ever research symposium in the interface of Neuroscience and Data Science.
Usha loves competitive data science and is a 2x Kaggle Grand Master, recently she took the TedX stage with a talk about the need for women in AI, and she’s been ranked in the top ten Data Scientists in India by Analytics India Magazine in 2020 and 2021. She has developed curriculum at top global universities and is a current participant in the Stanford Scholar Initiative.
Author: Usha Rengaraju, Principal Data Scientist at MUST Research
PROBLEM STATEMENT
Public transport is the primary mode of travel in many countries and an efficient and well-planned public transportation network can save time for commuters. Public transportation can be the most eco-friendly option for commuting and hence it is important to make the journey of commuters as optimal as possible. Graphs are used for modeling and navigating complex network systems like public transportation.
THE CHALLENGE
Develop an effective public transportation system that is easier for commuters to navigate. Traffic data modeling has a wide spectrum of applications like alleviating traffic congestion, making better travel decisions, and improving the quality of travel for end customers. Traffic flow prediction, anomaly events of accident detection can be studied by traffic data modeling. Complexity and nonlinear spatio-temporal correlations coupled with the highly dynamic nature of road networks makes it challenging to model traffic data. Inherent graph structure of traffic networks opens doors to many graph-based deep learning models which have achieved significant results in comparison with other traffic data modeling approaches. Many of the current graph approaches are unable to consider multiple passenger characteristics like total travel time, minimum number of transfers between stations and total distance of travel etc. The participants should come up with novel approaches to build a public transportation system that considers several parameters like weather, nearby events, distance, waiting time, travel time, and number of transfers.
RESOURCES
Ellie Young
Founder at Common Action
Ellie Young is founder of Common Action. Common Action is an innovation community facilitating collaborative climate and SDG/ESG action with data tech, and specifically knowledge graphs (KG). With community & software products, CA seeks to enable cross-sector groups to converge, share and report data, and dynamically organize towards urgent global priorities.
Previously the Head of Community at the Knowledge Graph Conference, Ellie integrates tech and community tools with sustainability domain expertise to design and implement systems for complexity management.
Graph for Better World: Enable Search For United Nations Sustainable Development Goals
Author: Ellie Young, Founder at Common Action
PROBLEM STATEMENT
Over the course of the pandemic, the world has broadly converged on the need to increase action across environmental and social sustainability challenges. The UN Sustainable Development Goals (SDGs), developed between thousands of global community representatives together with the United Nations, represent the authoritative set of sustainability challenges for humanity to achieve by the end of this decade. Covering interconnected social, environmental, and economic targets, the SDGs represent urgent, life-saving action for billions of people worldwide.
Because the SDGs span interconnected systems–such as climate change, biodiversity, and poverty–impacts in one target area are often related to dynamic phenomena in other areas. This presents both a challenge and an opportunity for development efforts: although the interconnected nature of these challenges makes for complex implementation conditions, there also exists the opportunity to combine synergies between dynamic project efforts to address multiple goals simultaneously. For example, in impoverished rural areas, school systems can be set up to address education and encourage female empowerment simultaneously–creating impact across three SDG goals.
Thus, a knowledge graph system has the potential to enable collective intelligence and coordination between actors, maximize resources, and ultimately, increase mpact.
However, although the interconnections between target areas are studied by scientists and international experts, no complete graphical or visual representation yet exists. Further, domain experts typically publish their findings in scientific papers and institutional reports. Therefore, the causal links between these elements are represented primarily in unstructured text, which remains distributed across a vast set of publications; for instance, the World Bank Open Knowledge Repository contains over 33,000 reports, and forms just one portal of development reports.
THE CHALLENGE
To unlock this vast information resource and support swifter and more impactful worldwide SDG action, participants are challenged to expose the rich contextual data published in reports from various development publication portals, such as the World Bank, United Nations, International Finance Corporation, and similar. The winning solution will identify granular concepts and goals related to each of the 17 SDG goals, and ideally how those concepts are interlinked with each other. For example, what topics are mentioned in the IPCC climate change report, that are also discussed in the Millennium Ecosystem Assessment? These may also be enriched with additional data sources, such as open datasets. The application should be user-friendly, and easy for a non-technical audience to navigate and understand.
The Sustainable Development Goals
#1 No Poverty
#2 Zero Hunger
#3 Good Health and Well-Being
#4 Quality Education
#5 Gender Equality
#6 Clean Water and Sanitation
#7 Affordable and Clean Energy
#8 Decent Work and Economic Growth
#9 Industry Innovation and Infrastructure
#10 Reduced Inequalities
#11 Sustainable Cities and Communities
#12 Responsible Consumption and Production
#13 Climate Action
#14 Life Below Water
#15 Life on Land
#16 Peace, Justice and Strong Institutions (violence, corruption etc)
#17 Partnerships for the goals
RESOURCES
Here is a starting list of recommended reports and data to begin:
Check out available datasets with our partner, Data.World, for inspiration and ideas.
Author: McKenzie Steenson, Computer Science Undergraduate Student at Boise State University | Developer Advocate Intern at TigerGraph
PROBLEM STATEMENT
“Women make up only 28% of the workforce in science, technology, engineering and math (STEM), and men vastly outnumber women majoring in most STEM fields in college. The gender gaps are particularly high in some of the fastest-growing and highest-paid jobs of the future, like computer science and engineering.” (AAUW)
Women in STEM face equal economic and educational challenges that may contribute to the low number of women continuing on in STEM fields after college, “38% of women who major in computers work in computer fields, and only 24% of those who majored in engineering work in the engineering field.” (Pew Research) Until the differences in these statistics can be closed, companies and communities must come together to support women to succeed in math and science. The American Association of University Women outlines four keys to closing the STEM Gap, yet dynamic and global solutions have yet to seem to exist. Giving women opportunities to advance their careers and connect to other strong women, especially in STEM careers, can help narrow the pay gap which will in turn enhance women’s economic security. It will also continue to develop a diverse and successful STEM workforce to prevent bias in products and services created, which benefits all.
THE CHALLENGE
Here is a sample application, STEM Women, that highlights connecting women in STEM. The tool, a simple searchable database, was created to combat the lack of representation of women in STEM, but is only focused in Australia and not globally. Graph can be utilized to build a database of many people, places, and things and connect them all together by their relationships, and the power of graph technology can take those relationships global. For example, TigerGraph’s powerful machine learning applications for recommendation systems can drive women to the right job postings/openings, networking opportunities, diversity/inclusion events, mentorships, etc. to women in STEM and other related fields. The winning solution will utilize graph to help create more economic opportunities for women pursuing or currently in STEM careers using the power of graphs. Technology driven solutions in the technology driven field of STEM will continue the pursuit of closing the STEM gap, providing support and opportunity for all.
RESOURCES
Use your imagination and unique data resources to capture consumer sentiments and relationships between their purchasing behavior and preferences.
Dataset Example Resources:
Graph for Better World: Find Ethically-Sourced Goods
Author: Daniel Barkus, Developer Advocate at TigerGraph
PROBLEM STATEMENT
Consumers are more and more aware of the “process” that goes into producing their goods. These processes are not always the most aligned with good ethics and are often built on exploiting labor in less wealthy regions of the world. There are various organizations attempting to keep track of these labor and human rights violations as well as the companies that profit from them, but that data is scattered and companies often hide behind shell corporations or private investments to attempt to save their name from being associated with the atrocities that they exploit. Until there is customer awareness and brand accountability to force these companies to change, then these exploitations will continue to run rampant, shortening the lifespan of workers in developing countries and poisoning the land that those people survive off of.
THE CHALLENGE
Corporate investments, LLC filings, parent company information, and part sourcing are all publicly available information. By combining the many different financial, hierarchical, and social rights data, it is possible not only to show which companies are directly supporting these rights violations, but the products that are a direct result of them. Consumers should know exactly what exploitations go into the products that they consume and how the production and consumption of those products has a direct impact on the regions that produce them.
RESOURCES
Author: You
The mission of the Graph For All Million Dollar Challenge is to ignite people’s passion and innovativeness from around the world to solve real problems using graph analytics. For that reason, we are inviting you to design your own problem statement that you want to solve. If you select this as an option, you will be asked to describe the problem you are trying to overcome as part of your submission.
Your problem statement should cover the following:
Identify the problem you are trying to solve
Two to three sentence explanation of why the problem is important and who it affects. Quick description on what’s currently being done and why it is not enough. Also include the worst case scenario description if the problem continues to go unchecked.
What is the challenge?
Talk about what’s been done: X has failed in the past, but now we have graph technology to take another shot at things. Outline a grand vision for the solution, what would be your optimal outcome.
RESOURCES
Dr. Jay Yu is the VP of Product and Innovation at TigerGraph, responsible for driving product strategy and roadmap, as well as fostering innovation in graph database engine and graph solutions. He is a proven hands-on full-stack innovator, strategic thinker, leader, and evangelist for new technology and product, with 25+ years of industry experience ranging from highly scalable distributed database engine company (Teradata), B2B e-commerce services startup, to consumer-facing financial applications company (Intuit). He received his PhD from the University of Wisconsin - Madison, where he specialized in large scale parallel database systems
Todd Blaschka is a veteran in the enterprise software industry. He is passionate about creating entirely new segments in data, analytics and AI, with the distinction of establishing graph analytics as a Gartner Top 10 Data & Analytics trend two years in a row. By fervently focusing on critical industry and customer challenges, the companies under Todd's leadership have delivered significant quantifiable results to the largest brands in the world through channel and solution sales approach. Prior to TigerGraph, Todd led go to market and customer experience functions at Clustrix (acquired by MariaDB), Dataguise and IBM.