
"Data Management is 50% of Engineering." I first heard this phrase about 20 years ago, during the first year of my engineering career. There's a lot of truth to that statement. With the explosion in data and devices, I think it might be 60 or 70 percent of engineering by now!
There are so many results, reports, analysis, simulations, measurements and so forth, that it's often difficult to get a clear conclusion from the data. It's annoying and tiresome to spend hours finding, copying and renaming files. It is much easier to see what the data is telling us, if the data is easy to find, understand and compare. Also, good data management frees up time and energy to spend on finding conclusions and creative solutions.
TURN THIS ---> INTO THIS
IL_test2.s4p --->
Test2_IL_PN100123_SN02.s4p
FEXT_Test4_try2.s4p ---> Test4_FEXT_PN100123_SN02_try2.s4p
NEXT_try3_TEST6.s4p ---> Test6_NEXT_PN100123_SN02_try3.s4p
IL_test2.s4p --->
Test2_IL_PN100123_SN02.s4p
FEXT_Test4_try2.s4p ---> Test4_FEXT_PN100123_SN02_try2.s4p
NEXT_try3_TEST6.s4p ---> Test6_NEXT_PN100123_SN02_try3.s4p
Here are some simple steps to achieve good data management:
1. Long filenames are good !
1. Long filenames are good !
- Every file should have unique labels, so that it can be traced to its source and recreated if necessary.
- Adding details in a file header is also very valuable.
- For instance, an s-parameter file is named: Partnumber_batchnumber_32AWG_100ohms.s4p
- The header of the file shows when it was measured and on what equipment.
- Now the user knows what the file measured (partnumber...) and when (date-time) and where (what equipment.)
2. Use tools to do the work faster and better.
- Utilize tools that automatically read files, extract traceability information, read data, plus analyze and make reports.
- Once item 1 is implemented, using tools to extract and sort information gets much easier. For instance, take the filename from item 1, extract the part number, 100 ohms impedance, data, etc, and make a report with plots and analysis results.
- If you have some tools that do this, learn them, master them and use them. If not, get them or make your own !
3. Keep strange or bad data.
- Label them accurately and descriptively..... and keep them. These datapoints are precious because they are great opportunities for learning.
- I've often looked back at strange data a few weeks later, and realized why it was strange and how to fix it. Now I have a dataset to show the problem and how to fix it.
4. When organizing data into groups, use 3 or 4 levels or categories, not too many, not too few.
- Too many levels are hard to navigate and process, too few levels make big piles of miscellaneous data in the same directory.
- Side note: Never use "archive" in the name of a file or directory. Archive is a code word that means "Nobody will look at this ever again."
5. Make readme.txt notes or decoder rings to explain data labels that aren't obvious.
- The person you are most likely to help is yourself, months or years later.
- Here's an example. There's a set of experimental data (simulation, measurement, analysis, whatever) labeled A,B,C,D and E.
- Make a short text file that explains what A,B,C, D and E are.
- It's freeform text, so write whatever helps you describe the data. Put in links to websites or to other filenames, or point to some company database that gives more detail.
measurementA.data - baseline production data for part 123measurementB.data - baseline product, but with a hole in it measurementC.data - invalid data, bad electrical connection measurementD.data - high tension experiment measurementE.data - updated product, prototype E
In order to implement these steps quickly and easily, I wrote a couple tools to copy, move and rename files. For all you matlab users out there, the functions "easycopy" and "easyrename" can be downloaded from the Matlab File Exchange. These functions can take any files and copy/move/rename them to the desired filenames and locations. They provide a way of organizing and labeling files with some automation, so that the user can quickly implement a file and directory naming strategy.
Easycopy can copy or rename a list of filenames to a list of destinations. It can also use wildcards to find a bunch of files and then automatically copy them to new filenames. It can read a list of files and destinations from a spreadsheet or text file and perform a batch of commands all at once. Easyrename does the same things as easycopy, but with a move/rename command.
Check out these functions and let me know what you think !
FREE MATLAB DOWNLOADS
https://www.mathworks.com/matlabcentral/fileexchange/63417-easycopy
https://www.mathworks.com/matlabcentral/fileexchange/63586-easyrename
FREE MATLAB DOWNLOADS
https://www.mathworks.com/matlabcentral/fileexchange/63417-easycopy
https://www.mathworks.com/matlabcentral/fileexchange/63586-easyrename