Big Data Ethics & Development

Introduction

Various Lenses will be used to Understand Big Data Ethics in this research work whilst interpreting the implication of Big Data Ethics. This study focuses on the key constructs of big data ethics and a diverse set of topics related to different perspectives.

The paper will start with a thorough explanation of the various constructs of big data ethics using different lenses and principles. Each of the lenses and principles of big data ethics will be explored. Topics like privacy, consent, confidentiality, transparency, openness, economic valuation, ownership, and the role of the institutional review board in assessing risk to human subjects will be reviewed. The paper will evaluate the ethical issues, access restriction limits, and cost-benefit analysis from a philosophical and practical point of view. Finally, a prescription of innovative approaches to the practice of meeting ethical and legal requirements will be provided.

Big Data

Big data is the latest trend in today’s time due to its exponential growth and its high demand for smart management (Joshi, 2015). There are many characteristics of big data, some of which comprise velocity, volume, value, variety, and veracity. Big data is typically in one of these three varieties – structured, unstructured, or semi-structured. The nature and sensitivity of the dataset varies per the industry it was collected from. Structured datasets are very organized and formatted data that are usually classified as quantitative data with rows and columns. For example, a bank database which could be structured data might contain very sensitive information about a customer’s financial records that needs to be protected.

Unstructured data on the other hand is qualitative in nature in the form of text, audio, or video files. This may be an X-ray image of a patient at the hospital with the name and address of the patient. The unstructured dataset is unorganized and can be stored using a NoSQL database. Semi-structured datasets sit in the middle of structured and unstructured data. They include tags and semantics markers that mount data into records and fields in a dataset like JSON and XML. An article by Timofeeva (2019) brought out the business benefits of using big data and analytics technologies in the retail industry (Timofeeva, 2019). It drew attention to data-driven and analysis-based approaches to commerce whilst identifying leading software solutions and their capabilities.

With the emergence of big data, organizations are able to collect customer data from several sources like social media, web visits, or call logs to improve the customer interaction experience. This allows them to offer customized deals to their clients and reduce customer churn. Big data enables corporations to identify fraud patterns thereby speeding up compliance and regulatory reporting. Due to the sensitive nature of some of these datasets, companies have to implement an extra layer of safety like encryption technologies for transferring, storing, and monitoring big datasets.

Big Data Ethics

Big data ethics is described as outlining, guarding, and recommending concepts of right and wrong practice in relation to using data, particularly personal data. A review study on the aspects of data ethics in a changing world (Hand, 2018) addresses how automation of data collection procedures and the utilization of sophisticated tools for analyzing real-time data are radically changing things. Hence the rise of the philosophical and legal debates about the right, legitimate, and proper ways to use such datasets. The author expresses the need to ethically consider both current and future use of big data.

The goal of big data ethics is to create an ethical and moral code of conduct for data use (Hand, 2018). Big data ethics similarly address the generation, collection, sharing, and use of big data. While considering how big data practices respect values like privacy, fairness, and transparency. It provides the principles behind how organizations gather, protect, and use datasets. Some of the categories of big data ethical concerns cover misuse of personal information, private information becoming public and being discriminated against.

Ethical Responsibilities of Big Data Experts

Big data experts comprise Data scientists, Data engineers, and other IT professionals who utilize big data. These data experts need to always abide by ethical principles and understand the consequences of working with big data. They must be certain their data analysis is unbiased and without prejudice. They must seek the permission of participants whose data they will be analyzing. And make sure the participants understand how their data will be used before obtaining their precise and informed permission. A Data scientist’s ethical issues arise due to the misuse of the dataset collected rather than what was the intended purpose. Businesses must develop a data-driven culture where employees understand the limitations, expectations, and restrictions of big data development.

Ways to Avoid Big Data Ethics Breaches

Corporations need to build a data-driven culture of protection to guarantee a safe and ethical use of data. They can avoid big data ethical breaches by prioritizing data privacy, providing robust security measures, and consistently evaluating and improving privacy practices. Organizations can guard sensitive data, build trust with stakeholders, and avoid prospective legal reputational risks by implementing these data privacy strategies. Clear policies and procedures for handling big data should be established, whilst promoting continuous education of employees to promote a culture of transparency and accountability. It is also important to implement privacy by design impact assessment, and regularly review and update privacy practices with the company.

Scenarios Where Big Data Ethics Were Breached & How They Were Resolved

In discussing big data ethics, it could be best to review instances of ethical breaches. Equifax Inc. an American multinational consumer credit reporting agency in 2017 experienced data breach. Where a majority of the private records of Americans, Canadian citizens, and British citizens were compromised (Kolevski et al., 2021). They had a settlement with the United States Federal Trade Commission and offered affected users settlement funds in addition to free credit monitoring. Equifax received serious criticism for its inadequate security measures and slow response to the breach. This made users raise serious concerns about the ethics of data security and the obligations of companies to protect user data. Apple has instituted a privacy commitment to its customers and IBM’s AI ethics policy illustrates positive data ethics practices and equitable decision-making within these organizations.

Advantages & Disadvantages of Big Data Ethics

There are several business benefits of big data ethics that corporations can benefit from as they create and maintain a structured and transparent big data ethics strategy. The first benefit is it builds trust when businesses apply the key ethical principles of fairness, privacy, transparency, and accountability to their big data algorithmic models. The output can retain trust in how the organization utilizes the data it acquired from its client base. This also builds goodwill and loyalty for the company as well as boosting their brand value and reputation.

Most consumers are willing to pay a premium to transact with companies they can trust. Once that trust is broken by the business using the customer’s personal data irresponsibly, the customer base will stop doing business with such a corporation. In addition, companies that adhere to big data ethics and moralities demonstrate fairness in their decision-making. There are several areas of apprehension in big data ethics that profile the potential for immoral use of data. Companies that have developed a common code of ethics are better prepared to maintain a sustainable ethical stance.

Review of Topics

Privacy:

This explains how the data of a customer is expected to be treated with a great level of confidentiality. This may be due to an assurance that the organization might have given to the customer. But data breaches have become very frequent over time making clients hesitant to supply their information in certain instances. To encourage more customers to be willing to share their information, organizations would need to do a better job educating consumers about the difference between privacy and secrecy.

Privacy ethics entail a lot of different concepts like data protection, data exposure, liberty, autonomy, and data security. Big data privacy constitutes the condition of privacy, the right to privacy, and the loss of privacy and invasion. Due to the scale and velocity of big data, it pose a worrying concern since the traditional privacy processes do not protect sensitive data. This could lead to an exponential growth in cybercrime and data leaks. There have been several incidents of corporations suffering from a breach.

These could be resulting from many data protection errors like an unsecured ElasticSearch database. During such incidents, a hacker might be able to access and scrape the database with people’s names, contact details, and connected social media account login names among others. With the growing analytical power of big data, there is another concern about how this will impact the privacy of people. This is because, when personal data from various digital platforms are mined, they could create a full picture of someone without their open consent.

A review study on data ethics decision aid on the dialogical framework for ethical inquiry of AI and data projects in the Netherlands highlighted expressed objections to the lawfulness of government data projects in an international context (Franzke et al., 2021). The paper continued to elaborate on how these developments gave rise to concerns regarding citizens’ privacy. They even questioned potential bias that could be realized in such systems. Their concern was although privacy law and data management protection might be applied to the data project, they are not confident the application of algorithms of data subjects in the development of the data project will be regulated.

Consent in simple terms means one has given uncoerced authorization for something to happen to them. Informed consent is the most cautious, courteous, and ethical form of consent. It demands the data collector make a significant effort to offer participants a sufficient and accurate understanding of how their data will be used. Big data does not make it possible to use the traditional route of informed consent where the consent for data collection was usually taken for participation in a single study. This is because, the goal of big data studies, mining, and analytics is to draw patterns and trends between data points that were earlier inconceivable. This makes it hard to have consent viewed as ‘informed’ since neither the data collector nor the study partaker can reasonably understand what will be gathered from the data and how it could be used.

Board consent is a revision to the standard of informed consent that pre-approves secondary uses of data. A study on ethics and big data in health uttered concern about consent to using health data including genetic data. The authors explained how creating a healthcare database for future unspecified research with ethics approval and governance has led to a rise in academic debate. These arguments are on the legality of such broad consent although it characterizes the longitudinal and epidemiological nature of biobanking (Knoppers & Thorogood, 2017).

Confidentiality:

Healthcare professionals have legal and ethical responsibilities to safeguard the confidentiality of information of their patients. Intellectuals and others involved in human research have legal and ethical duties to guard the privacy of clinical study participants. Federal statutes bind all ASHA members to treat clients or patients in healthcare facilities, schools, or private practice. The Health Insurance Portability and Accountability Act (HIPAA) privacy and security rules apply to healthcare facilities professionals. Schools operate under the Family Educational Rights and Privacy Act (FERPA) and HIPAA. There are individual states that also have statutes governing the confidentiality of patients.

A study on the right to confidentiality explains how conducting research using participants who are children and adolescents raises concerns about ethical challenges regarding confidentiality. In the study, the researchers expressed how studies involving younger participants usually apply methodologies that aim at disclosing sensitive information about methods impacting adolescent mental and physical health. These sensitive data could cover their sexual activity, smoking, alcohol consumption, illegal drug use, self-damaging, and suicidal behaviors. The authors concluded that only a little information is offered on what kind of disclosed information, if revealed, might justify breaching confidentiality (Hiriscau et al., 2014).

Transparency:

Transparency grants individuals the right to be knowledgeable about how their personal data will be processed (Xafis & Labude, 2019). It allows users to know what factors are captured in the analysis they are getting. This level of transparency in big data analytics is important because when people are informed about how their data are being utilized in data processing and big data analytics, it boosts trust. Safer tools are employed to permit individuals to verify conclusions drawn and correct mistakes. When there is a lack of transparency when it comes to big data it eventually results in a lack of trust from customers whose data are being analyzed.

With transparency, an organization’s encounters, processes, and data are made open to inspection by the company publishing information about the project being carried out in a complete, exposed, logical, easily accessible, and free format. Transparency is very important to an ethical climate within a corporation. Research transparency ensures studies are conducted openly and transparently to ensure everyone aware of the outcome of the finished studies. Another example of maintaining transparency is updating customers on how their data is being used and sold.

Openness:

Openness considers the interplay between ethics and the intentional sharing or withholding of data, information, or knowledge. It is important that consumers are able to determine how their data moves from one data system to another. Participants who have consented to allow their anonymized data to be used for future research and teaching purposes might have a change of mind in their decision-making if the anonymity of their personal data cannot be guaranteed. It is therefore necessary to clarify these limitations and be open about them to participants from the very beginning.

Openness is very important in research ethics as it gives rise to public benefits. This happens by contributing wider access to data, publications, and other research materials. This will in turn facilitate a broader dissemination of scientific knowledge, and a greater return on investment in research data. It can also create more openings for replicating and building upon scientific results. A study on the openness of big data and data repositories presented some of these issues that arise when it comes to open data sharing of big data. It also provided insights into how it applies to the nature of data repositories (Xafis & Labude, 2019).

Ownership:

Ownership in big data terms refers to the redistribution of data, modification of data, and the capacity to gain from data innovation. Ownership of data can be divided into two categories. The first category is the right to control data which allows editing, managing, sharing, and deleting the data. The second category the right to benefit from data refers to revenue from the exploitation or sale of data. Most corporations that generate data may not necessarily own the data. Ownership of patient information as it relates to big data has not been entirely recognized by the medical academic community (Mirchev et al., 2020).

In the study on ownership of patient information, the authors sought to determine how medical academic communities perceived ownership of patients’ records in the context of big data (Mirchev et al., 2020). Although these were not clearly distinguished, the study on ownership of medical data was summarized as ethical, legal, political, and managerial. The researchers concluded that there was no consensus on the ethical requirement for justice and the necessary legal regulations. They would recommend adequate policy decisions to be expressed through relevant legal frameworks. Since that would allow the development of the right policies regulations, and ethical principles that should be known, understood, and upheld in the medical field.

Institutional Review Board’s Role In Assessing Risk to Human Subjects

Institutional Review Boards (IRBs) provide independent assessments of future research to ensure they are ethically acceptable, check for potential biases, and estimate compliance with regulations and laws designed to protect human subjects (Grady, 2015). Research ethics committees supply protection for human research participants via enhanced and periodic independent reviews of the ethical suitability of proposals for human research. Since research methods and opportunities have evolved, the rules, models, methods, and articulation of goals of IRB review have to evolve to keep pace with modern times of research work.

The implementation of IRB principles needs to be optimized to help shape the future structure of the organization, processes, and outcomes of review and oversight by IRBs and related players. Existing data privacy regulations like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) do not directly address ethics, but there is a significant overlap between key privacy requirements, such as lawfulness and accountability, and the principles of AI ethics. Thus, ensuring ethical AI helps ensure data privacy compliance.

Philosophical and Practical Evaluation of Ethical Issues and Access Restriction Limits

The four major issues of information ethics in modern times are privacy, accuracy, property, and accessibility. Privacy considers what information people can keep to themselves and not be forced to reveal to an audience (Franzke et al., 2021). And if these data are released to others, under what conditions and with what safeguards will be put in place? Accuracy ensures the authenticity, reliability, and precision of the information being presented.

Property elaborates the ownership of the information collected. It factors in what is a just and fair price for the exchange of these datasets as well. The property also answers the question of who owns the intellectual property rights to a dataset. Accessibility means what information does a corporation have a right or privilege to obtain, under what conditions, and with what safeguards? Access to information technology allows organizations to store, convey, and process information.

Philosophical and Practical Evaluation of Cost-benefit Analysis

Cost-benefit analysis is a methodical process that organizations use to analyze which decisions to make and which to forgo. The overall goal of a cost-benefit analysis is to determine if a project is worth undertaking. It adds up the potential rewards expected from a situation and subtracts the total costs associated with taking that action. Overall, cost-benefit analysis uses noticeable financial metrics such as revenue earned or profits resulting from the decision to pursue a project (Grover et al., 2018). Sometimes, the more complex cost-benefit analysis includes sensitivity analysis, discounting of cash flows, and what-if scenario analysis for various options.

The cost-benefit analysis process starts with identifying the project scope, then determining the cost implications, identifying the different underlying principles with the associated benefits, before computing the analysis estimations and finally implementing and making recommendations. Cost-benefit analysis techniques rely on data-driven decision-making and any resulting outcome that is recommended has been thoroughly evaluated from the information that was gathered specifically for the identified problem. This makes the results deeper and much more reliable in terms of findings.

In some instances, when projects are very huge with long-term scope, a cost-benefit analysis could fail to account for significant financial worries like inflation, interest rates, unstable cash flows, and the present value of money. Another disadvantage of cost-benefit analysis is that it relies mostly on forecasted figures and when a wrong forecast is made, estimated findings will most likely be faulty. The cost-benefit process requires capital and resources to gather data to make the needed analysis. Which may be pointless for smaller projects.

Implications of Applying Big Data Ethics

Although big data has many benefits, it has also boosted new ethical concerns that stem from the need for huge datasets in Big Data research. It also comes from the possibility of analytics programs being used to reflect human errors (Howe & Elenberg, 2020). Some of the ethical issues in the use of big data for research include bias which covers the assumptions of honesty and carefulness. It also includes risks associated with publications and the reuse of big data when it comes to the principles of openness and efficiency. There could be several ethical challenges like a researcher not knowing a participant’s wishes.

Another ethical challenge could be that the individuals involved in the study might have wishes that contradict regulatory requirements. Since different participants are involved in the collection of big datasets, one participant’s wishes might not be the same as another individual’s wishes. Certain information technological ethical issues that relate to big data include personal privacy, access rights, harmful actions, patents, copyright, liability, and piracy. The implication of an organization not addressing its ethical issues could result in a decrease in its reputation in the sight of its stakeholders.

Prescription of Innovative Approaches

To ensure big data is used ethically, researchers and data scientists must ascertain that the data analyses they carry out are unbiased and without prejudice. Considering obtaining informed permission from people whose data were collected when using big data for analytics is another crucial ethical factor. It is very important these participants understand how their data will be used before they consent to grant their permission to participate in any research work.

Big data, when used ethically can be used for innovation in healthcare (Mirchev et al., 2020). It has the ability to use patients’ data to offer better diagnoses, prevent diseases, and provide real-time alerts for immediate care – telemedicine. This approach can also be used to monitor patients at home and prevent unnecessary hospital visits. Ethics is very important in big data because by not following ethical procedures, corporations could incur significant reputational and financial costs.

Conclusion:

Big data ethics is very important in the current digital era. The key constructs of big data ethics and the different topics covered in this paper explored the various principles of big data ethics. Companies that apply these key ethical principles of transparency, privacy, fairness, and accountability to their big data analytics and AI models can produce results that are trustworthy. They will be able to utilize their datasets to build even better goodwill and loyalty that will increase brand value and the corporation’s reputation.

Author: Adwoa Osei-Yeboah

References

Franzke, A., Muis, I. M., & Schaefer, M. T. (2021). Data Ethics Decision Aid (DEDA): a dialogical framework for ethical inquiry of AI and data projects in the Netherlands. Ethics and Information Technology, 23(3), 551-567. 10.1007/s10676-020-09577-5

Grady, C. (2015). Institutional Review Boards Purpose and Challenges. Chest, 148(5), 1148-1155. 10.1378/chest.15-0706

Grover, V., Chiang, R. H. L., Liang, T., & Zhang, D. (2018). Creating Strategic Business Value from Big Data Analytics: A Research Framework. Journal of Management Information Systems, 35(2), 388-423. 10.1080/07421222.2018.1451951

Hand, D. J. (2018). Aspects of Data Ethics in a Changing World: Where Are We Now? Big Data, 6(3), 176-190. 10.1089/big.2018.0083

Hiriscau, I. E., Stingelin-Giles, N., Stadler, C., Schmeck, K., & Reiter-Theil, S. (2014). A right to confidentiality or a duty to disclose? Ethical guidance for conducting prevention research with children and adolescents. European Child & Adolescent Psychiatry, 23(6), 409-416. 10.1007/s00787-014-0526-y

Howe, E. G., & Elenberg, F. (2020). Ethical Challenges Posed by Big Data. Icns, 17(10), 24-30. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7819582/pdf/icns_17_10-12_24-30.pdf

Joshi, P. (2015). Analyzing Big Data Tools and Deployment Platforms International Journal of Multidisciplinary Approach and Studies, 2(2), 4806-4810. 10.22214/ijraset.2018.4787

Knoppers, B. M., & Thorogood, A. M. (2017). Ethics and Big Data in health. Current Opinion in Systems Biology, 4, 53-57. 10.1016/j.coisb.2017.07.001

Kolevski, D., Michael, K., Abbas, R., & Freeman, M. (2021). Cloud computing data breaches: A review of U.S. regulation and data breach notification literature. Paper presented at the 1-7. 10.1109/ISTAS52410.2021.9629173 https://ieeexplore.ieee.org/document/9629173

Mirchev, M., Mircheva, I., & Kerekovska, A. (2020). The Academic Viewpoint on Patient Data Ownership in the Context of Big Data: Scoping Review. Journal of Medical Internet Research, 22(8), e22214. 10.2196/22214

Timofeeva, A. (2019). Big Data Usage in Retail Industry. Izvestia Journal of the Union of Scientists - Varna. Economic Sciences Series, 8(2), 75-82. 10.36997/IJUSV-ESS/2019.8.2.75

Xafis, V., & Labude, M. K. (2019). Openness in Big Data and Data Repositories. Asian Bioethics Review, 11(3), 255-273. 10.1007/s41649-019-00097-z