I decided to simplify things by going back to Google Drive. Using OneDrive or Sharepoint or whatever it was turned out to be one tool too many to keep track of. I had made the move from Google Drive to OneDrive a couple years ago when I realized that Google Drive was stamping our time-lapse image sets with the date and time of upload, getting in the way of one of the ways I like to confirm the frequency of image capture for an experiment. It was still possible to get the actual image capture date from the EXIF info, but you couldn’t just sort by that within Google Drive. OneDrive didn’t do that, and we were experimenting with OneNote for lab notes, so it seemed like a good move. Now we’re no longer using OneNote because notebooks were not preserved when students graduated! Fortunately I had backups of notebooks, but this made it untenable.

I feel like using the Google shared drive is better supported by our IT support, and that’s worth something alone. I think the file date problem can be circumvented by just checking the image EXIF info if I need to double-check the interval between images, so even though it isn’t as easy as looking at the file date, at least it’s not lost permanently. Plus students are already using the shared drive to keep their notes and other info, so it’s easier for them to just use the same for data.

The students just log into their student Google account and can access the shared lab drive from there. When I moved things back, I used Transmit, the FTP client from Panic, to log into Google Drive and move large sets of files and folders easily. I think it might be a little more fault-tolerant than the web uploading/downloading interface.

There’s a brief story about the lab in yesterday’s edition of the local paper. It highlights a recent paper I was a co-author on, focused on the development of a pipeline for standardizing analysis of RNA-seq data from spaceflight experiments. Pleased to see some positive local press coverage!

MATATO work notes

Today I worked on making a connection to the Pi and getting a file from a specified directory, which works now. I used the MATLAB command line to connect and download all files in a directory:
mypi = raspi('','piUsername','password');
My current thinking is to transfer the most current file to the Mac, possibly even deleting it from the Pi after the transfer to save space. These should transfer into a unique directory on the Mac; alternatively, they could transfer into a ‘current’ directory, which is then copied and renamed after the experiment.
This approach wouldn’t take as much effort as trying to set everything up on the Pi alone, which would require basically rewriting the script and motor control bits in Python or something to run on the Pi. This may be the way to go in the long run, but for now it makes sense to just improve on what I had before by introducing the higher quality images from the Pi HQ camera.

Today I installed Zotero on my newly-refreshed system. It had a bit of work to do to download and sync references in my library, but it didn’t seem to sweat too bad while it was doing it. I’m still sticking with the semi-automated way of adding references by DOI using the ‘Add Item by Identifier’ button on the toolbar after copying the DOI (in text) from the article page. This seems to work great, even better in some ways than using a bookmarklet or extension, which is still not well supported in the new version of Safari. 

A while back I read about how to use a CO2 sensor to estimate the air exchange rate in a room. So I bought their recommended sensor, the Aranet4, and set it up in one of the teaching labs for last Thursday and today’s afternoon sessions. I am really surprised and impressed at the lack of change in CO2 levels in the lab during a lab session. At no point did it go above 600 ppm, and most of the time it stayed in the 400’s, indicating a very high rate of air exchange in the room.

Graph of CO2 (ppm) as a function of time. Yellow highlights were periods in which the room was occupied by 8 students.

Quick notes – PlantCV

I’m dabbling a little with PlantCV, an open-source image analysis package with lots of pre-built features for doing plant image analysis. It can be installed using the Conda package manager system, which means I won’t have to mess around with all the dependencies. It can also be installed on a RasPi, which is also very interesting. It isn’t clear to me how well it works on roots, as most of the examples I recall seeing seem to focus on leaves and stems. I can’t think of why this could be a problem, but I’d like to poke around a little bit with some example root files once I capture them using the RasPi hq-cam next week.

I need to assess my file storage system, as it seems like I’ve accumulated several different storage systems, cloud file providers, etc over the past several years and I keep feeling the urge to streamline or re-think my approach. One of the key things I need to be able to feel comfortable with is having full local backups of critical data. This means I shouldn’t be storing my research files in iCloud Drive, as I am currently. At the same time, I can probably trim down the number of working directories and archive some of the older stuff either in a cloud service and/or on a local hard disk archive, which would save some space on my working laptop drive and still fulfill my need to have control of the data. I need to rationalize this before I lose something! For example, I pay for a Backblaze account, but I’m pretty sure things that are only in iCloud Drive are not being backed up to that account.

Testing the Raspberry Pi High Quality Camera

I dabbled a little today with the RPi high quality camera using the same lens I use for the other image capture systems (a 75 mm fixed C-mount lens) with a 70 mm extension tube. I took a photo of a ruler and measured the distance of 3 mm in pixels, which I found to be 2144 pixels. The vertical distance in the frame was around 4 mm, which seems to be in the neighborhood of what I want for an Arabidopsis root experiment. This may mean I don’t really need a panel of multiple LEDs to achieve even background illumination, at least when using this lens and tube setup for a single root tracking experiment. OG ROTATO scaling is 664 µm per 100 pixels, or 6.64 µm/px. The RPi HQ camera has a resolution of 3000 µm in 2144 pixels, or 1.4 µm/px. This is 4.75x that of the old system.
If I’m going to use it for regular root experiments, I might remove the IR filter that’s glued to the sensor.

Photo of a ruler with the high quality RasPi camera

RNA-seq read data processing

I’ve started to dip my toe in the pool of bioinformatics methods, using our recent data sets as an incentive to learn what I’m doing. In reading more about using Salmon, it seems I should build a “decoy-aware” index. So the commands below source accomplish that.

grep "^>" <(gunzip -c Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz) | cut -d " " -f 1 > decoys.txt
sed -i.bak -e 's/>//g' decoys.txt
cat Arabidopsis_thaliana.TAIR10.cdna.all.fa.gz Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz > gentrome.fa.gz
salmon index -t gentrome.fa.gz -d decoys.txt -p 12 -i salmon_index --gencode

This pegs the CPU at 100% but does not fill the 8 GB of RAM on this machine. This took about 7 min on this machine.

After this I will do another trial run with a few reads and compare the mapping rate with this decoy-aware index compared to the raw transcriptome I used yesterday (I renamed the output from yesterday quants_1).

ARM CPUs and Research Software on Macs

There’s some great discussion in the comments section of the article The Case for ARM-Based Macs – TidBITS. Several of the commenters put their finger on one of the issues I haven’t seen much if any discussion about:

In the past fifteen years, a lot of developers have moved to Mac because it provides an X86+unix environment, which is a huge boon when developing software which will eventually deploy to a cloud environment, where Linux on X86 is king. The differences between BSD and Linux notwithstanding, this has made the Mac the machine of choice for a huge community of web and open source developers. We can even use tools like VMware, Virtualbox, Docker, and Kubernetes to mimic our target deployment environments.

This is definitely the case across the sciences, including several areas that overlap with my own work. Being able to install and run various bioinformatics tools and/or image analysis packages locally has allowed me to get a better handle on how these tools work. I presume that an architecture change to ARM-based CPUs will still permit most of these tools to work, but there will almost certainly be a transition cost to recompile and optimize for the new platform. The article Re-engine, Not Re-imagine by Brendan Shanks, puts an optimistic spin on the move, essentially arguing that it can and will be invisible to users. Maybe that’s the time I should look more closely at moving some of these tasks off my local machines and into something like CyVerse or some other cluster-for-hire. As a learner though, I’m hesitant to do this because it introduces another layer of abstraction that I’ve found to have its own problems.

Update: This week’s ATP touches on this concern, about whether a processor change would mean a major disruption to Unix-based programs and tools for science, although they focused on other (non-science) Unix command line tools. Their discussion reminded me that the PowerPC to Intel transition also happened in the Mac OS X era, meaning that many such programs had to be recompiled for the new Intel CPUs, and eventually they were. They also mentioned that many Unix command line programs already run on ARM chips, such as on the Raspberry Pi. So things in active development that are open-source likely will make the move fairly quickly.