• rockSlayer@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    41
    ·
    4 days ago

    I’m a data analyst at a medical nonprofit, primarily doing analyses on germline variants for rare forms of cancer. I’m new to this kind of work, but had a decent educational background in biology.

    Something I’ve learned is that genetics are complicated as hell. A single gene can produce multiple different proteins, and proteins change over time due to somatic variation. Only 1% of the genome are protein coding, called exomes. Exomes can be affected by variations to start and stop codons, non coding regions, and untranslated regions. There are entire fields dedicated to studying genome-wide, exomics, transcriptomics, proteomics, phenomics, and probably several others that I don’t know about. The amount of data involved with these fields is in the tebibytes region. Have you ever seen a “small” 3GiB csv? I have. The filtered and cleaned data frames created by genetics are over 100 columns wide and have nearly 5 million entries.

    There are companies creating artificial life by generating custom chromosomes. There’s a whole field of computer science dedicated to biological computing, using DNA as a storage medium. There are companies dedicated to simply classifying genes.

    DNA is cool as hell.

    • MrEff@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      ·
      4 days ago

      If you really want to blow your mind, look into the theoretical alternatives to DNA. we are all taught about RNA and how it is a precursor to DNA, but what if it went another way? Look up PNA, PNA-O, or even GNA. If life existed on other worlds, there is a decent chance it follows an xNA structure, but not necessarily DNA.

    • pelespirit@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      18
      arrow-down
      3
      ·
      4 days ago

      There are companies creating artificial life by generating custom chromosomes.

      My dude, not a fun thing to think about who might have control over that. Is it a musk, zuck, cook or epstein?

      • rockSlayer@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        11
        ·
        4 days ago

        No, none of those guys are involved afaik. The one that made the first breakthrough in artificial life is ran by the same dude who competed with the Human Genome Project to map 99% of the human genome. They modified an extremely simple bacteria that only had something like 300 base pairs

        • pelespirit@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          4
          ·
          4 days ago

          We still don’t know what type of person they are. Them being smart and focused on the research, doesn’t give them a pass. They could even not care who else has the info.

          • halcyoncmdr@piefed.social
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            1
            ·
            4 days ago

            Yup. Many Nazi scientists only cared about the research. A lot of medical and physics breakthroughs last century directly resulted from those experiments.

              • halcyoncmdr@piefed.social
                link
                fedilink
                English
                arrow-up
                1
                ·
                2 days ago
                1. Pervitin was an early form of methamphetamine, in large use by the Nazi military. Kept soldiers awake and alert and minimized appetite to stretch rations. Research around it and similar things helped further addiction and psychological distress.

                2. Elektroboot was the first electric submarine able to stay submerged for large lengths of time without needing to vent things like diesel exhaust. Even being able to charge while submerged.

                3. The Intramedullary Rod, an essential part of modern orthopedic surgery to heal broken bones.

                4. The Horton Ho 229 was an early attempt at stealth and flying wing aircraft. While never fully produced, the development led to further research after resulting in modern stealth aircraft and overall aircraft efficiency, and by extension detection and tracking.

                5. The Enigma Machine was a marvel of cryptographic security. Pretty sure this stands on its own.

                6. Messerschmitt Me 262 was the first mass produced fighter jet. Much of even modern jet propulsion technology stemmed from this research.

                7. 3D Films were used to enhance their propaganda well before Hollywood considered it.

                8. The Z4 Computer was one of the earliest commercial digital computers.

                9. Of course the V2 rocket. And by extension every Project Paperclip scientist brought back to the US to develop space technology at NASA, up to and including the Saturn V rocket and Apollo missions.

                10. The jerrycan, for fuel transport. Literally named after the British slang for German soldiers. So useful the Allies adopted it during the war.

                11. Chloroquine, an anti-Malaria drug developed by the Nazis, initially toxic but further refined after the war.

                12. Night vision technology also had massive developments made by their military scientists.

    • ptu@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      3
      ·
      4 days ago

      Interesting, could you enlighten what types if data is in those 100 columns? I’m aware of ATGC and thought it would be just one column, but maybe the rest are some that indicate intensity or activity. Or what sequence they are part of.

      • rockSlayer@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        3
        ·
        4 days ago

        Well it varies depending on what the file is meant for. Usually there’s columns like chromosome, variant position, reference nucleotide, observed nucleotide, type of variation, codon sequence, gene name, etc.

        There’s also columns that result from various analyses. In the file I’ve been working on lately, there are columns such as variant impact, level of confidence, pathogenicity, clinical significance, etc.

        • The_v@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          4 days ago

          That sounds like a marker file. It’s a bit different than a sequence file.

          Molecular markers are linked to specific sequences in the DNA. These markers are generally close by or in the gene of interest. All the extra columns described its characteristics and results. Anyplace in the entire genome where there is one nucleotide difference (polymorphic) can be another marker. There’s millions of these and they add up to massive files.

          A sequence file is basically just a long boring sequence of nucleotides and are not that large. Now some of the files you use to generate the sequence. Let’s just say they had to wait almost 20 years for computers to get fast enough to process those files in a reasonable time. Those make the marker files look like childs play.

          • rockSlayer@lemmy.blahaj.zone
            link
            fedilink
            English
            arrow-up
            1
            ·
            4 days ago

            I’m not familiar with the name of the file I’m currently working with tbh. It’s used to create the annotation files for regenie analyses. It has every variant for every gene within the biobank. There’s far more than just missense; there are stop/start gain/loss, splice donor/acceptor, frameshifts, and ptv. It contains primateAI scores, spliceAI scores, cava data, clinvar data, and more.

    • foofiepie@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 days ago

      I have no context/knowledge on topic. Are you saying DNA has that much data that can be extracted from it? If so, that’s nuts.

      • rockSlayer@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        3
        ·
        4 days ago

        yes, all that data is extrapolated directly from DNA. It’s a huge amount of information. All the DNA in a single human cell is directly translated to about 750MiB. Now, add in the fact that genomic studies use biobanks, like the UK Biobank, which contains the genetic info of hundreds of thousands of people. The data we can extrapolate from DNA is absolutely massive.

        • foofiepie@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 days ago

          We have 3/4 of a GB of data in every cell? I need to read more into this. Wish I’d bothered with biology at school. 😂

    • Optional@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 days ago

      That’s too much science. We, as a people, need less sci- wait, no. No, no. Uh - We need bett-er? Science? Hmm.

      Look just make it an animated cartoon with fun music for now and we’ll circle back.